A simple regulatory architecture allows learning the statistical structure of a changing environment
AA simple regulatory architecture allows learning the statistical structureof a changing environment
Stefan Landmann, Caroline M. Holmes, and Mikhail Tikhonov Institute of Physics, Carl von Ossietzky University of Oldenburg, D-26111 Oldenburg, Germany Department of Physics, Princeton University, Princeton, NJ 08544, USA Department of Physics, Center for Science & Engineering of Living Systems,Washington University in St. Louis, St. Louis, MO 63130, USA (Dated: January 5, 2021)Bacteria live in environments that are continuously fluctuating and changing. Exploiting anypredictability of such fluctuations can lead to an increased fitness. On longer timescales bacteriacan “learn” the structure of these fluctuations through evolution. However, on shorter timescales,inferring the statistics of the environment and acting upon this information would need to be ac-complished by physiological mechanisms. Here, we use a model of metabolism to show that a simplegeneralization of a common regulatory motif (end-product inhibition) is sufficient both for learn-ing continuous-valued features of the statistical structure of the environment and for translatingthis information into predictive behavior; moreover, it accomplishes these tasks near-optimally. Wediscuss plausible genetic circuits that could instantiate the mechanism we describe, including onesimilar to the architecture of two-component signaling, and argue that the key ingredients requiredfor such predictive behavior are readily accessible to bacteria.
Keywords: Fluctuating environments | Metabolic regulation | Learning
Organisms that live in changing environments evolvestrategies to respond to the fluctuations. Many suchadaptations are reactive, e.g. sensory systems that al-low detecting changes when they occur, and respondingto them. However, adaptations can be not only reactive,but also predictive. For example, circadian clocks allowphotosynthetic algae to reorganize their metabolism inpreparation for the rising sun [1, 2]. Another example isthe anticipatory behavior in
E. coli , which allows it toprepare for the next environment under its normal cy-cling through the mammalian digestive tract [3]; similarbehaviors have been observed in many species [4, 5].All these behaviors effectively constitute predictionsabout a future environment: the organism improves itsfitness by exploiting the regularities it “learns” over thecourse of its evolution [6]. Learning such regularities canbe beneficial even if they are merely statistical in nature.A prime example is bet hedging: even if the environmentchanges stochastically and without warning, a popula-tion that learns the statistics of switching can improveits long-term fitness, e.g., by adopting persistor pheno-types with appropriate probability [7, 8]. The seeminglylimitless ingenuity of evolutionary trial-and-error makesit plausible that virtually any statistical structure of theenvironment that remains constant over an evolutionarytimescale could, in principle, be learnt by an evolvingsystem, and harnessed to improve its fitness [9].However, the statistical structure of the environmentcan itself change, and this change can be too quick to belearned by evolution. For example, an organism mightexperience a period of stability followed by a period oflarge fluctuations; or an environment where two resourcesare correlated, and then another where they are not.Note that our focus here is not the rapidity of fluctua-tions, but the slower timescale on which the structure of those fluctuations changes. One expects such scenarios tobe particularly common in an eco-evolutionary context.As an example, consider a bacterium in a small pool ofwater (Fig. 1A). Its immediate environment, shaped bylocal interactions, is fluctuating on the timescale at whichthe bacterium changes neighbors. The statistical prop-erties of these fluctuations depend on the species com-position of the pool. As such, the fast fluctuations arepartially predictable, and learning their structure couldhelp inform the fitness-optimizing strategy: a neighborencountered in a recent past is likely to be seen in thenear future. However, these statistics change on an eco-logical timescale, and such learning would therefore needto be accomplished by physiological, rather than evolu-tionary, mechanisms.On a physiological timescale, this problem is highlynontrivial: the organism would have to perform inferencefrom prior observations, encode them in memory, and actupon this knowledge (Fig. 1B). It is clear that solutions tothis problem do exist: such behaviors, common in neuralsystems, can be implemented by neural-network-like ar-chitectures; and these known architectures can be trans-lated into biochemical networks [10–14]. But single-celledorganisms operate in a severely hardware-limited regimerarely probed by neuroscience. Streamlined by evolution,bacterial genomes quickly shed any unused complexity.Whether we could expect learning-like behaviors frombacteria depends on whether useful networks could besimple enough to plausibly be beneficial.Known examples of phenotypic memory, e.g., when theresponse is mediated by a long-lived protein, can be in-terpreted as a simple form of learning [15, 16]; circuitscapable of adapting to the current mean of a fluctuat-ing signal, as in bacterial chemotaxis [17], also belongin this category. Prior theory work has also proposed a r X i v : . [ q - b i o . CB ] D ec -1 0 1 D D -101 D D ( P P ) ( P P ) -1 0 1 D D ( P P ) Optimally steered P ObservationsLearn statisticsUse for predictions BC P D D P A M, Ꮁ ED M, Ꮁ FIG. 1.
Learning environment statistics can benefit living systems, but is a difficult problem. (A) An environment is characterized notonly by its current state, but also by its fluctuation structure , which determines what changes are likely to occur in the future. In thiscartoon, the immediate environment of a given bacterium (center) is shaped by its close neighbors, while the ensemble of likely futurechanges is determined by the species composition of the habitat. Two environments experienced as identical at a particular time coulddiffer in the fluctuation structure; this difference can inform the fitness-maximizing strategy, but cannot be sensed directly. (B) Instead,the fluctuation structure would need to be learned from past observations, and used to inform future behavior. (C) To formalize theproblem, we consider a situation where some internal physiological quantities (cid:126)P ( t ) must track fluctuating external factors (cid:126)D ( t ) undergoinga random walk. Since it is impossible to react instantaneously, (cid:126)P always lags behind (cid:126)D . The dashed ellipse illustrates the fluctuationstructure of (cid:126)D (encoded in parameters M and Γ, see text), and changes on a slower timescale than the fluctuations of (cid:126)D . (D, E) Theoptimal behavior in the two-dimensional version of our problem, under a constrained maximal rate of change (cid:107) ˙ P (cid:107) . For a given current (cid:126)D (blue dot), the optimal control strategy would steer any current (cid:126)P (green arrows) towards the best guess of the future (cid:126)D , which dependson the fluctuation structure (red ellipse: (D) fluctuations are uncorrelated and isotropic; (E) fluctuations have a preferred direction). Theoptimal strategy is derived using control theory (SI section “Control theory calculation”). that simple genetic circuits could learn more subtle bi-nary features, such as a (transient) presence or absenceof a correlation between two signals [18].Here, we show that a simple generalization of aubiquitous regulatory motif, the end-product inhibition,can learn, store, and “act upon” the information oncontinuous-valued features such as timescales and corre-lations of environmental fluctuations, and moreover, cando so near-optimally. We identify the key ingredients giv-ing rise to this behavior, and argue that their applicabil-ity is likely more general than the simple metabolically-inspired example used here. The setup
To model the general challenges of surviving in a fluc-tuating environment, consider a situation where some in-ternal physiological quantities (cid:126)P = ( P , . . . , P N ) musttrack fluctuating external variables (cid:126)D = ( D , . . . , D N ).For example, the expression of a costly metabolic path-way would ideally track the availability of the relevantnutrient, or the solute concentration in the cytoplasmmight track the osmolarity of the environment. In ab-stract terms, we describe these environmental pressuresby the time-dependent (cid:126)D ( t ), and postulate that the or-ganism fitness is determined by the average mismatch − (cid:113) (cid:104) (cid:80) Ni =1 ( P i − D i ) (cid:105) , a quantity we will henceforth call“performance”. Here and below, angular brackets denote averaging over time.If (cid:126)D changes sufficiently slowly, the organism can senseit and adapt (cid:126)P accordingly. We, instead, are interestedin the regime of rapid fluctuations. When changes in (cid:126)D are too rapid for the organism to match (cid:126)P to (cid:126)D exactly,it can rely on statistical structure. At the simplest level,the organism could match the mean, setting (cid:126)P ≡ (cid:104) (cid:126)D (cid:105) .However, information on higher-order statistics, e.g. cor-relations between D and D , can further inform the be-havior and improve fitness.To see this, in what follows, we will consider the min-imal case of such structured fluctuations, namely a N -dimensional vector (cid:126)D = ( D , . . . , D N ) undergoing a ran-dom walk in a quadratic potential (cid:126)D ( t + ∆ t ) = (cid:126)D ( t ) − M · (cid:16) (cid:126)D ( t ) − (cid:126)D (cid:17) ∆ t + √ t (cid:126)η, (1)with mean (cid:126)D , fluctuation strength Γ, independent Gaus-sian random variables (cid:126)η with zero mean and variance one,and the matrix M defining the potential.In this system, the relevant “fluctuation structure” isdetermined by M and Γ. In one dimension, Eq. (1) gives D a variance of Γ /M . In two dimensions, denoting theeigenvalues of M as λ , , the stationary distribution ofthe fluctuating (cid:126)D is a Gaussian distribution with princi-pal axes oriented along the eigenvectors of M , and stan-dard deviations along these directions given by (cid:112) Γ /λ and (cid:112) Γ /λ . Intuitively, we can think of the fluctuat-ing (cid:126)D as filling out an ellipse (Fig. 1C). Going forward,when we refer to learning fluctuation structure, we meanlearning properties of M and Γ.If M and Γ are known, the optimal strategy minimiz-ing (cid:104) ( (cid:126)P − (cid:126)D ) (cid:105) , where (cid:126)D ( t ) is set by Eq. (1), can becomputed exactly, as a function of the maximum allowedrate of change (cid:107) ˙ P (cid:107) . (If we do not constrain (cid:107) ˙ P (cid:107) , theoptimal behavior is of course (cid:126)P = (cid:126)D .) Briefly, the op-timal behavior is to steer (cid:126)P towards the best guess ofthe expected future (cid:126)D (see SI section “Control theorycalculation”). This best guess depends on the fluctua-tion structure, as illustrated by the comparison betweenFig. 1D and 1E for an isotropic and an anisotropic M .However, in our problem we will assume that M and Γdo not stay constant long enough to be learned by evolu-tion, and thus are unknown to the system. In this regime,it is not clear that the behavior of an M - and Γ-optimizedsystem is relevant. Nevertheless, we will describe a regu-latory architecture consisting of common regulatory ele-ments that will adapt its responsiveness to the fluctuationstructure of its input (“learn”); for example, in the two-dimensional case, it will indeed develop the anisotropicresponse shown in Fig. 1E. Moreover, we will find thesteady-state performance of our architecture to be near-optimal, when compared to the theoretical ceiling of asystem that knows M and Γ perfectly. Proposed architecture: end-product inhibitionwith an excess of regulators
The section above was intentionally very general. Todiscuss solutions available to cells, it is convenient to re-strict the scope from this general formulation to a morespecific metabolically-inspired case. From now onwards,let D i be the instantaneous demand in metabolite x i (determined by external factors), and P i be the rate atwhich the metabolite is produced, both defined in unitsof metabolite concentration per unit time. The num-ber of components of the vector (cid:126)D now has the meaningof the number of metabolites, and we will denote it as N x . The cell needs to match (cid:126)P to (cid:126)D (or, equivalently,maintain the homeostasis of the internal metabolite con-centrations x i ).The simplest way to solve this problem is via feedbackinhibition. Consider first the case of a single metabolite x . If an accumulation of x inhibits its own synthesis, adecreased demand will automatically translate into a de-creased production. For our purposes, we will model thisscenario by placing the synthesis of metabolite x underthe control of a regulator a (e.g., a transcription factor),which is, in turn, inhibited by x (Fig. 2A). For simplicity,we will measure regulator activity a directly in units ofequivalent production of x . The dynamics of this system,linearized for small fluctuations of metabolite concentra-tion x , can be written in the following form (see SI section a a P P x x D D a a x x D D AB C DE a a P x D P P σ σ σ FIG. 2.
The regulatory architecture we consider is a simple gen-eralization of end-product inhibition. (A) Simple end-product in-hibition (SEPI) for one metabolite. Green arrows show activation,red arrows inhibition. (B) Natural extension of SEPI to severalmetabolites. (C) We consider regulatory architectures with moreregulators than metabolites, with added self-activation (circular ar-rows) and a nonlinear activation/repression of regulators a µ by themetabolite concentrations x i (pictograms in circles). (D) Visualiz-ing a regulation matrix σ µi for two metabolites. In this example,the first regulator described by (cid:126)σ activates the production of x ;the second inhibits x and activates x . For simplicity, we choosevectors of unit length, which can be represented by a dot on theunit circle. This provides a convenient way to visualize a givenregulatory architecture. (E) The nonlinear dependence of regula-tor activity dynamics ˙ a µ /a µ on metabolite concentrations x i in ourmodel (see Eq. (4)). “Simple end-product inhibition”):˙ x = P − D xx source-sink dynamics of metabolite x (2a) P = aP definition of regulator activity a (2b)˙ a = x − xλ regulator activity inhibited by x (2c)Here, we introduced P with dimension of produc-tion (concentration per time) to render a dimensionless.In Eq. (2c), λ has the units of concentration × time,and setting λ ≡ x τ a defines a time scale for changesin regulator activity. Assuming the dynamics of metabo-lite concentrations x are faster than regulatory processes,and choosing the units so that x = 1 and P = 1, wesimplify the equations to: x = P/DP = aτ a ˙ a = 1 − x. (3)We will refer to this architecture as simple end-productinhibition (SEPI). For two metabolites (cid:126)x = ( x , x ), thestraightforward generalization is to have two independentcopies of this circuit, with two regulators a , a (Fig. 2B).Denoting the number of regulators as N a , we note thatin the SEPI architecture, there are as many regulators asthere are metabolites: N a = N x .The architecture we will describe builds on this widelyused regulatory motif, and relies on three added ingredi-ents:1. an excess of regulators: N a > N x ;2. self-activation of regulators;3. nonlinear activation/repression of the regulators a µ by the metabolite concentrations x i .Here and below, we use index µ for regulators ( µ =1 . . . N a ) and index i for metabolites ( i = 1 . . . N x ).These three ingredients, we claim, will be sufficientfor the circuit to both learn higher order statistics andto use this information appropriately when matching theproduction to demand. It is important to emphasize thatall three are readily accessible to cells. In fact, there aremultiple ways to build regulatory circuits exhibiting theproposed behavior using common regulatory elements.To focus on the general mechanism rather than any oneparticular implementation, we will defer describing theseexample circuits until later in the text; here we will con-sider a minimal modification of Eq. (3) that contains therequired ingredients: x i = P i /D i (4a) P i = Σ µ σ µi a µ (4b) τ a ˙ a µ = a µ max (cid:32) d, (cid:88) i σ µi (1 − x i ) (cid:33) − κa µ . (4c)This architecture arguably bears a similarity to neu-ral networks, and, as we will see, the familiar intuitionabout the value of extra “hidden nodes” indeed holds.However, we caution the reader not to rely too heavilyon this analogy. For example, here σ µi is a constant ma-trix describing how the activities of regulators a µ controlthe synthesis of metabolites x i .For two metabolites ( N x = 2) as in Fig. 2C, eachregulator is summarized by a 2-component vector (cid:126)σ µ =( σ µ , σ µ ); its components can be of either sign (or zero)and specify how strongly the regulator a µ is activating orrepressing the synthesis of metabolite x i . For simplicity,below, we will choose these vectors to be of unit length.Then, each regulator (cid:126)σ µ is fully characterized by an an-gle in the ( x , x ) plane, which allows for a convenientvisualization of the regulatory systems (Fig. 2D). The σ µi defines the regulatory logic of our system and doesnot change with time. The parameter d ≤ d = 0 (strong nonlinearity) un-less explicitly stated otherwise. Finally, the parameter κ reflects degradation and is assumed to be small: κ (cid:28) x .Previously, for SEPI, it could be neglected, but here, itwill matter due to the nonlinearity; see SI section “Sim-ple end-product inhibition” for more details. The pa-rameters used in simulations are all listed in SI section“Parameters used in figures”.Just like simple end-product inhibition in Eq. (3), themodified system Eq. (4) will correctly adapt productionto any static demand (see SI section “Adaptation tostatic demand”). In the following, we will show thatthe added ingredients also enable learning the structureof fluctuating environments. For this purpose, we expose our system to demands D ( t ) with fixed means ( D i = 1)but a changing fluctuation structure. The regulatory architecture described aboveoutperforms simple end-product inhibition bylearning environment statistics
To show that our system is able to adapt to differentfluctuation structures, we probe it with changing envi-ronmental statistics, and show that it, first, learns thesestatistics, and, second, is able to make use of this infor-mation in its behaviour.For simplicity, we start with the 1-dimensional case(Fig. 3A-F). In dimension N x = 1, an excess of regula-tors means we have both an activator a + and a repressor a − for the production of x (Fig. 3A). This is reminiscentof paradoxical regulation [19]. We probe our system withchanging environmental statistics by exposing it to a de-mand D ( t ) with an increasing variance (Fig. 3B, C). Asa reminder, here and below, the mean demand is fixedat 1.Faced with a faster fluctuating input, our system up-regulates both a + and a − while keeping a + − a − constant( a + − a − ≈ D = 1; Fig. 3D). In this way, the two ac-tivity levels a + and a − encode both the mean and thevariance of fluctuations. Crucially, the system makes useof the information it stores: The increased regulator ac-tivities allow future changes in P to be faster. The sys-tem’s responsiveness , which we can define as R ≡ d ˙ PdD ,increases as a + + a − (Fig. 3E; see also SI section, “Defin-ing the systems responsiveness”). As a result, as shownin Fig. 3F, our system is able to perform approximatelyequally well (after adaptation time) in each environment,unlike a system like simple end-product inhibition, whichis unable to adapt its sensitivity. In summary, Figs. 3D-Fshow that the simple architecture of Fig. 3A can not onlylearn statistics of environment fluctuations, but also “actupon this knowledge,” effectively performing both com-putations of Fig. 1B.The idea of learning the fluctuation structure is per-haps clearer in dimension N x = 2, since the two de-mands can now be correlated with each other, and itseems intuitive that a system able to learn the typi-cal direction of fluctuations (the angle α in Fig. 3H)should be able to track the input better. Indeed, as wesaw in Figs. 1D-E, when environment fluctuations areanisotropic, the responsiveness of a well-adapted strat-egy must be anisotropic as well: the preferred directionmust elicit a stronger response. Mathematically, the re-sponsiveness R is now a matrix R ij = d ˙ P i dD j , and for a well-adapted system we expect its eigenvectors to align withthe principal directions of M . In Figs. 3G-L, Fig. 4 andFig. 5A, our discussion will focus on this two-dimensionalcase.Figs. 3G-L show the behavior of our system (Eq. (4))with N a = 5 regulators (Fig. 3G), exposed to an input AB GH σ σ v a r D a SEPIa+a- t / a ( P D ) SEPI N a = 2 a A n g l e LearnedTrue t / a ( P D ) SEPI N a = 5 CDE IJF KL var D var D a S E P I a + a -
60 3030 603 6 10 a var D a S E P I a + a -
60 3030 603 6 10 a Angle L d t / a ( P D ) SEPI N a = 2 A LearnedTrue t / a ( P D ) SEPI N a = 5 FIG. 3.
The regulatory architecture we consider successfully learnsenvironment statistics, and outperforms simple end-product inhibi-tion. Left column in one dimension, right column in two. (A) Reg-ulation of a single metabolite x with one activator a + and onerepressor a − . (B, C) The variance of D is increased step-wise (byincreasing Γ). (D) Regulator activities a ± respond to the changingstatistics of (cid:126)D . For SEPI, the activity of its single regulator is un-changed. (E) Faced with larger fluctuations, our system becomesmore responsive. (F) As fluctuations increase, SEPI performancedrops, while the circuit of panel A retains its performance. PanelsD-F show averages over 80 realizations. (G) In the 2d case, we con-sider a system with N a = 5 regulators; visualization as in Fig. 2D.(H) Cartoon of correlated demands with a dominant fluctuationdirection (angle α ). (I) We use α to change the fluctuation struc-ture of the input. (J) Regulator activities respond to the changingstatistics of (cid:126)D . Colors as in panel G. (K) The direction of largestresponsiveness (“learned angle”; see text) tracks the α of the input.(L) The system able to learn the dominant direction of fluctuationsoutperforms the SEPI architecture, even if the timescale τ a of SEPIis adjusted to match the faster responsiveness of the N a = 5 sys-tem (see SI section “Parameters used in figures”). Panels J-L showaverages over 40 realizations. Panels B and H are cartoons. structured as shown in Fig. 3H, where we vary α (Fig. 3I).In other words, we rotate the fluctuation structure ma-trix M in Eq. (1), keeping its eigenvalues λ , fixed with (cid:112) λ /λ = 10 (this fixes the ratio of major to minorsemi-axis lengths).With N a = 5 regulators, matching the mean valueof (cid:126)D would leave N a − a µ ’s, the pattern for how the stimulus parameters aremapped to the system’s internal state is not immediatelydiscernible. However, this pattern becomes clear whenwe consider the responsiveness matrix R . Fig. 3K plotsthe “learned angle”, defined as the direction of the domi-nant eigenvector of R ; we find that it tracks the stimulusangle. Finally, Fig. 3L demonstrates that our architec-ture is able to make use of this learning, outperformingthe SEPI system, whose responsiveness is isotropic andfixed. The performance is near-optimal
In the previous section we have shown by example(Fig. 3) that the proposed regulatory architecture canlearn statistics of the environment, and that this learningimproves performance. We now characterize systemati-cally the conditions under which learning improves per-formance and compare our system to the theoretical per-formance ceiling.The fluctuation structure in our model is defined byΓ and M . We first investigate the dependence of per-formance on Γ (Fig. 4A), exposing our system to atwo-dimensional input structured as in Fig 3H with (cid:112) λ /λ = 10 as before, α = π/
4, and a changing Γ.Although the input is two-dimensional, changing Γscales the overall magnitude of fluctuations, and the be-havior is analogous to the simpler one-dimensional exam-ple shown in the first column of Fig. 3. At Γ = 0 (staticinput), and by extension, for Γ finite but small, examin-ing the steady state of Eq. (4) shows that only N x = 2out of N a regulators can be active. In this regime, oursystem is essentially identical to SEPI—the extra regu-lators, though available, are inactive—and in fact per-forms slightly worse. This is because at nonzero κ , thesteady state of Eq. (4) is slightly offset from the idealstate (cid:104) x i (cid:105) = 1. (While this effect can be corrected, it isonly relevant in the parameter regime where no learningoccurs, so we chose to keep Eq. (4) as simple as possi-ble; for additional discussion, see SI section “Performancepenalty from the degradation term”.)When Γ becomes sufficiently large, the first term inEq. (4c) (proportional to fluctuation size √ Γ) for one ofthe inactive regulators finally exceeds, on average, thedegradation term. At this point, the system enters the ( P D )
1e 20.00 0.05 0.100-4-8 ( P D )
1e 2 0.0 0.5 1.0Correlation-4-5-6-7 ( P D )
1e 2
SEPI24510
FIG. 4. The ability to learn statistics is most useful when fluc-tuations are large and/or strongly correlated. (A) The per-formance of different circuits shown as a function of Γ, whichscales the fluctuation magnitude (input is two-dimensionaland correlated, angle α = π/
4, anisotropy (cid:112) λ /λ = 10).Once the fluctuations become large enough to activate thelearning mechanism, performance stabilizes; in contrast, theSEPI performance continues to decline. Arrows indicate thetheoretical prediction for the threshold value of Γ; see SI sec-tion “The minimal Γ needed to initiate adaptation”. Dashedlines indicate the theoretical performance ceiling (calculatedat equivalent Control Input Power, see text). (B) Compar-ison of circuit performance for inputs of the same variance,but different correlation strengths. N a = 4 regulators ar-ranged as shown can learn the variance but not correlation;the SEPI architecture is unable to adapt to either. ParameterΓ is held constant at 0 .
05; the marked points are identical tothose highlighted in panel A (and correspond to fluctuationanisotropy (cid:112) λ /λ = 10). regime where the number of active regulators exceeds N x , and its performance deviates from the SEPI curve.Beyond this point, further changes to the stimulus nolonger affect performance, as our system is able to adaptits responsiveness to the changing fluctuation magnitude(compare to Fig. 3F). The threshold value of Γ satisfies √ Γ ∝ κ ; the proportionality coefficient of order 1 de-pends on the specific arrangement of regulators but canbe estimated analytically (see SI section “The minimalΓ needed to initiate adaptation”). The theoretically pre-dicted deviation points are indicated with arrows, and arein agreement with the simulation results. When a regu-lator in the system is particularly well-aligned with thedominant direction of fluctuations, the deviation occurssooner, explaining the better performance of our systemwhen the regulators are more numerous.To better assess the performance of our system, wecompare it to the theoretical optimum derived from con-trol theory, which we represent with dotted lines inFig 4A. For given M and Γ, the family of optimal behav-iors is parameterized by Control Input Power (CIP), de-fined as (cid:82) (cid:107) ˙ P (cid:107) dt . If (cid:126)P could react infinitely fast, it wouldtrack (cid:126)D perfectly, but maintaining such a perfect regula-tory system would presumably be costly; constraining theCIP is thus a proxy for specifying the maximum tolerablecost. In order to compare our system with the optimalfamily of solutions, we compute T (cid:82) T (cid:107) ˙ P (cid:107) dt of our sys- tem at each Γ ( T is the simulation time), and compareto the performance of the optimally steered solution witha matched CIP; details of the calculation can be foundin the SI section “Control theory calculation”. Fig. 4Ademonstrates that the simple architecture we describednot only benefits from matching its responsiveness to itsinput, but is in fact near-optimal when compared to any system of equivalent responsiveness.Having investigated the effect of fluctuation variance(changing Γ), we turn to the effect of their correlation.Up to now, we subjected our system to a strongly corre-lated two-dimensional input with anisotropy (cid:112) λ /λ =10 (illustrated, to scale, in Fig. 1E). We will now considera range of anisotropy values, down to anisotropy 1 (un-correlated fluctuations, Fig. 1D), keeping the variancesof D and D constant, α = π/ . N a = 5 orlarger, our system is able to take advantage of the corre-lation, assuming it is strong enough to activate the learn-ing mechanism. (In fact, its performance can reach val-ues that exceed the theoretical ceiling achievable by anysystem that assumes the two dimensions of (cid:126)D to be in-dependent, and thus must be exploiting the correlationin its inputs; see SI section “The system makes use ofcorrelations in the input” and Fig. S1.) For N a = 4,the performance curve remains flat. This is because thefour regulators are arranged as two independent copiesof the system shown in Fig. 3A (one { a + , a − } pair foreach of the two inputs); this architecture can take ad-vantage of the learned variance, but not the correlation.Finally, the SEPI architecture can adapt to neither vari-ance nor correlation; its performance curve is also flat,but is lower. As expected, the advantage of our archi-tecture manifests itself in environments with periods oflarge and/or strongly correlated fluctuations. The behavior is generalizable
The model described above was a proof of principle,showing that simple regulatory circuits could learn thefluctuation structure of their inputs. Given the simplicityof our model, it is not to be expected that the exactdynamics of Eq. (4) are replicated in real cells. However,the benefit of this simplicity is that we can now tracethis behavior to its key ingredients, which we expect tobe more general than the model itself.Specifically in the context of our model, we describedour three ingredients as an excess of regulators, nonlin-earity, and self-activation. The role of the first two isexamined in detail in Fig. 5A (for the two-dimensionalcase); self-activation is discussed in the SI section “Therole of self-activation”. In Fig. 5A, the parameter d onthe horizontal axis is the strength of nonlinearity (seeFig. 2E), from perfectly linear at d = −∞ , to stronglynonlinear at d = 0. The vertical axis corresponds to anincreasing number of regulators N a , which we label asin Fig. 2D. The point approximately corresponding tothe SEPI architecture is highlighted (a linear system of N a = N x = 2 activators). We see that in the nonlinearregime, adding regulators improves performance; note,however, that even a single extra regulator ( N a = 3) al-ready provides a significant benefit over the SEPI archi-tecture. For completeness, we also include the simplestsystem with a single regulator co-activating both x and x (bottom row of Fig. 5A).Fig. 5A shows that the reported behavior requires N a to exceed N x , and d to be sufficiently large. However,these ingredients are more general than the specific im-plementation in Eq. (4). In our model, additional regu-lators were required because they supplied the slow de-grees of freedom to serve as memory; such degrees of free-dom could be implemented in other ways, for example,as phosphorylation or methylation [17]. Similarly, whilenonlinearity is essential (linear dynamics cannot coupleto higher-order terms, such as fluctuation magnitude),its exact functional form may be changed while retain-ing the learning behavior (see SI section “Nonlinearityas a sensor of fluctuation variance”). Finally, the explic-itly self-catalytic behavior of a µ in our model is only onepossible strategy for translating the stored memory intoa faster response; as an example, in Fig. 3A the regu-lators a + and a − could activate each other rather thanthemselves.To demonstrate the broader generality of these ingre-dients, we constructed two circuits with very differentarchitectures, both reproducing the results of Fig. 3C-F.One of these, based on a pair of allosteric enzymes, andwith the toy nonlinearity of Fig. 2E replaced by more re-alistic cooperative binding, implements dynamics similarto those considered above and is discussed in the SI (seeFig. S3).The other circuit is different not only in appearance,but also in dynamics, and is shown in Fig. 5B. Here, in-stead of seeking to match P to D , we seek to maintainthe homeostasis of x perturbed by external factors. (Pre-viously, the two formulations were equivalent; for greatergenerality, in Fig. 5B we focus on the latter.) In this im-plementation, the production and degradation of x areboth catalyzed by a single bifunctional enzyme. Despitelooking quite different from Fig. 3A, this architecturepossesses the same key ingredients, and can exhibit thesame learning behavior (see Fig. S5). Briefly, the respon-siveness of this circuit scales with the overall expressionof the enzyme E , and larger fluctuations of x lead to up-regulation of E due to the nonlinearity, as before. Formore details, see SI section “Realistic biochemical imple-mentations”. The correspondence between the labeled point and the SEPI ar-chitecture is only approximate, because all models representedin Fig. 5A include self-activation and a small degradation term,while the SEPI architecture does not. However, with only 2 regu-lators, neither difference has a significant performance effect (seeSI section “The role of self-activation”). -10 -1 -0.08 -0.04 0 d S y s t e m ( P D ) SEPI × BC x nonlinear E κ x x E κ x A X s X P YY P FIG. 5. Two key ingredients enabling learning are nonlinear-ity and an excess of regulators. (A) System performance inthe two-dimensional case N x = 2, shown as a function of thenumber of regulators N a (vertical axis) and the strength ofnonlinearity d (horizontal axis; d = −
10 is indistinguishablefrom a linear system with d = −∞ ). The SEPI-like architec-ture (linear with N a = 2; see text) is highlighted. Each datapoint is averaged over α (see Fig 3H). (B) An alternative cir-cuit capable of learning fluctuation variance to better main-tain homeostasis of a quantity x . Synthesis and degradation of x are catalyzed by the same bifunctional enzyme, whose pro-duction is regulated nonlinearly by x itself. For more detailssee the SI section “Realistic biochemical implementations”.(C) Solid arrows: a common two-component architecture ofbacterial sensory systems with a bifunctional histidine kinase(X) and its cognate response regulator (Y). Adding an extraregulatory link (nonlinear auto-amplification, dashed arrow)can endow this system with self-tuned reactivity learning thestatistics of the input; see text. Interestingly, the same logic can be implemented as asmall modification of the standard two-component sig-naling architecture (Fig. 5C). In this architecture, thesignal s determines the concentration of the phosphory-lated form Y P of the response regulator Y ; the rapidityof the response is determined by the expression of thehistidine kinase X , present at a much lower copy num-ber. Thus, much like in Fig. 5B, a nonlinear activationof X by Y P (known as autoregulation [21] or autoampli-fication [16], and shown as a dashed arrow in Fig. 5C)would endow this signaling system with self-tuned reac-tivity that learns the statistics of the input. Discussion
In this paper we have studied a regulatory architec-ture which is able to infer higher order statistics fromfluctuating environments and use this information to in- Although the signaling architecture of Fig. 5C, at least in someparameter regimes, is known to be robust to the overall concen-trations of X and Y [20], this robustness property applies onlyto the steady-state mapping from s to Y P , not the kinetics. form behavior. For concreteness, we phrased the regula-tory task as seeking to match the production (cid:126)P of one ortwo metabolites to a rapidly fluctuating demand (cid:126)D . Al-ternatively, and perhaps more generally, the circuits weconstructed can be seen as maintaining the homeostasisin a quantity (cid:126)x that is continually perturbed by exter-nal factors. We demonstrated that a simple architecturewas capable of learning the statistics of fluctuations ofits inputs and successfully using this information to op-timize its performance. We considered one-dimensionaland two-dimensional examples of such behavior.In one dimension, learning the statistics of the inputmeant our circuit exhibited a self-tuned reactivity, learn-ing to become more responsive during periods of largerfluctuations. Importantly, we have shown that this be-havior can be achieved by circuits that are highly similarto known motifs, such as feedback inhibition (Fig. 2A, C)or two-component signaling (Fig. 5B, C). The latter con-nection is especially interesting: There are at least a fewexamples of two-component systems where autoamplifi-cation, a necessary ingredient for the learning behaviordiscussed here, has been reported [22, 23]. Moreover, inthe case of the PhoR/PhoB two-component system in E. coli , such autoamplification has been experimentallyobserved to allow cells to retain memory of a previouslyexperienced signal (phosphate limitation) [16], a behav-ior the authors described as learning-like. As reported,this behavior constitutes a response to the signal meanand is similar to other examples of simple phenotypicmemory (e.g. [15]); however, our analysis demonstratesthat a similar architecture may also be able to learn morecomplex features. Such a capability would be most usefulin contexts where the timescale of sensing could plausi-bly be the performance bottleneck. Since transcriptional processes are generally slower than the two-componentkinetics, we expect our discussion to be more relevant fortwo-component systems with non-transcriptional read-out, such as those involved in chemotaxis or efflux pumpregulation.In the two-dimensional case, our simple circuit wasable to learn and unlearn transient correlation struc-tures of its inputs. Our argument was a proof of prin-ciple that, e.g., the gut bacteria could have the meansto not only sense, but also predict nutrient availabilitybased on correlations learned from the past, includingcorrelations that change over faster-than-evolutionarytimescales, such as the life cycle (or dietary preferences)of the host. Importantly, we showed that this abilitycould come cheaply, requiring only a few ingredients be-yond simple end-product inhibition.The mechanism described here could suggest new hy-potheses for the functional role of systems with an ex-cess of regulators, as well as new hypotheses for bacterialfunction in environments with changing structure.Python scripts reproducing all figures are availableupon request and will be published alongside the finalversion of the manuscript.
Acknowledgments
We thank M. Goulian, A. Murugan, and B. Weinerfor helpful discussions. SL was supported by the Ger-man Science Foundation under project EN 278/10-1;CMH was supported by the National Science Foundation,through the Center for the Physics of Biological Function(PHY-1734030) and the Graduate Research FellowshipProgram. [1] D. Bell-Pedersen, V. M. Cassone, D. J. Earnest, S. S.Golden, P. E. Hardin, T. L. Thomas, and M. J. Zo-ran, Circadian rhythms from multiple oscillators: lessonsfrom diverse organisms, Nature Reviews Genetics , 544(2005).[2] K. Husain, W. Pittayakanchit, G. Pattanayak, M. J.Rust, and A. Murugan, Kalman-like self-tuned sensitivityin biophysical sensing, Cell Systems , 459 (2019).[3] M. A. Savageau, Escherichia coli habitats, cell types, andmolecular mechanisms of gene control, The AmericanNaturalist , 732 (1983).[4] I. Tagkopoulos, Y.-C. Liu, and S. Tavazoie, Predictivebehavior within microbial genetic networks, Science ,1313 (2008).[5] A. Mitchell, G. H. Romano, B. Groisman, A. Yona,E. Dekel, M. Kupiec, O. Dahan, and Y. Pilpel, Adaptiveprediction of environmental changes by microorganisms,Nature , 220 (2009).[6] A. Mitchell and W. Lim, Cellular perception and misper-ception: Internal models for decision-making shaped byevolutionary experience, BioEssays , 845 (2016).[7] E. Kussell and S. Leibler, Phenotypic diversity, popu- lation growth, and information in fluctuating environ-ments, Science , 2075 (2005).[8] J.-W. Veening, W. K. Smits, and O. P. Kuipers, Bistabil-ity, epigenetics, and bet-hedging in bacteria, Annu. Rev.Microbiol. , 193 (2008).[9] R. A. Watson and E. Szathm´ary, How can evolutionlearn?, Trends in ecology & evolution , 147 (2016).[10] A. Hjelmfelt, E. D. Weinberger, and J. Ross, Chemicalimplementation of neural networks and turing machines,Proceedings of the National Academy of Sciences ,10983 (1991).[11] T. J. Kobayashi, Implementation of dynamic Bayesiandecision making by intracellular kinetics, Physical Re-view Letters , 228104 (2010).[12] F. Fages, G. Le Guludec, O. Bournez, and A. Pouly,Strong turing completeness of continuous chemical re-action networks and compilation of mixed analog-digitalprograms, in International conference on computationalmethods in systems biology (Springer, 2017) pp. 108–127.[13] Y. Katz and M. Springer, Probabilistic adaptation inchanging microbial environments, PeerJ , e2716 (2016).[14] Y. Katz, M. Springer, and W. Fontana, Embody- ing probabilistic inference in biochemical circuits,arXiv:1806.10161.[15] G. Lambert and E. Kussell, Memory and fitness op-timization of bacteria under fluctuating environments,PLoS Genet , e1004556 (2014).[16] S. M. Hoffer, H. V. Westerhoff, K. J. Hellingwerf, P. W.Postma, and J. Tommassen, Autoamplification of a Two-Component Regulatory System Results in “Learning”Behavior, Journal of Bacteriology , 4914 (2001).[17] N. Barkai and S. Leibler, Robustness in simple biochem-ical networks, Nature , 913 (1997).[18] M. Sorek, N. Q. Balaban, and Y. Loewenstein, Stochas-ticity, bistability and the wisdom of crowds: a model forassociative learning in genetic regulatory networks, PLoSComput Biol , e1003179 (2013).[19] Y. Hart, Y. E. Antebi, A. E. Mayo, N. Friedman, andU. Alon, Design principles of cell circuits with paradox- ical components, Proceedings of the National Academyof Sciences , 8346 (2012).[20] E. Batchelor and M. Goulian, Robustness and the cy-cle of phosphorylation and dephosphorylation in a two-component regulatory system, Proceedings of the Na-tional Academy of Sciences , 691 (2003).[21] M. Goulian, Two-component signaling circuit structureand properties, Current opinion in microbiology , 184(2010).[22] D. Shin, E.-J. Lee, H. Huang, and E. A. Groisman, Apositive feedback loop promotes transcription surge thatjump-starts Salmonella virulence circuit, Science ,1607 (2006).[23] C. L. Williams and P. A. Cotter, Autoregulation is es-sential for precise temporal and steady-state regulationby the Bordetella BvgAS phosphorelay, Journal of Bac-teriology , 1974 (2007). Supplementary material
S1 Simple end-product inhibition
As the starting point of our regulatory architecture we consider a basic form of end-product inhibition (SEPI). Theenvironment is modeled by a time-dependent (externally induced) demand D ( t ) for metabolite x which is producedat a rate P (controlled by the system); both D and P are defined in units of metabolite concentration per unit time.The depletion of the metabolite depends on its concentration x and the demand D . Assuming first-order kinetics (or,alternatively, expanding a general function to linear order in small fluctuations of x ) the dynamics of x is:˙ x = P − D xx . (S1)Further, we consider the temporal evolution of the regulator activity a ˙ a = h ( x, a ) . (S2)By linearizing h ( x, a ) around the stationary values ( x , a ) we get˙ a = λ x ( x − x ) + λ a ( a − a ) . (S3)To examine this equation we introduce the Fourier transforms ˜ a ( ω ) = F [ a ( t ) − a ] and ˜ x ( ω ) = F [ x ( t ) − x ] and get iω ˜ a = − λ x ˜ x − λ a ˜ a ⇒ ˜ a ( ω ) = − λ x ˜ x ( ω )( λ a − iω ) λ a + ω . (S4)Equation (S4) shows that if the support of ˜ x ( ω ) is restricted to high frequencies, ω (cid:29) λ a , then the degradation term λ a ( a − a ) in Eq. (S3) is negligible. Including it would only add a restoring force, reducing the amplitude of fluctuationsof a , and decreasing the performance of the system. Since we are interested in the regime of fast fluctuations of x wechoose to omit this term in the SEPI system. With λ x = 1 /λ we arrive at the dynamics used in the main text: ˙ x = P − D xx source-sink dynamics of metabolite xP = aP definition of regulator activity a ˙ a = x − xλ regulator activity inhibited by x In the nonlinear system (Eq. (3) of the main text), however, fast fluctuations of x can cause the growth of a (asdiscussed in the section “The nonlinearity as a sensor of the fluctuation variance”), thereby inducing slow frequencymodes to its dynamics. Thus, in the nonlinear case, we cannot omit the degradation term. S2 Performance penalty from the degradation term
The model considered in the main text modifies the SEPI architecture as follows: x i = P i /D i (S5) P i = Σ µ σ µi a µ (S6) τ a ˙ a µ = a µ max (cid:32) d, (cid:88) i σ µi ( x − x i ) (cid:33) − κa µ . (S7)Consider the case of a static input. We observe that if x is set to 1, as in the main text, the presence of thedegradation term causes the equilibrium point of these dynamics to be displaced away from x i = 1. Therefore, fora static input, the performance of this system—the degree of mismatch between P i and D i , or, equivalently, thedeviation of x i from 1—is actually worse than the performance of the original SEPI.While the case of a static input is irrelevant for the discussion in the main text, this slight offset leads to aperformance penalty also for a fluctuating input. Indeed, time-averaging Eq. (S7) shows that for any active regulator,we must have (cid:42) f (cid:32)(cid:88) i σ µi (1 − x i ) (cid:33)(cid:43) = κ, (S8)1where f is the nonlinearity in our equation, f ( γ ) = max( d, γ ). Clearly, we will in general again have (cid:104) x i (cid:105) (cid:54) = 1; this isparticularly obvious in the limit of small fluctuations when the nonlinearity is “not active”, such that f ( γ ) = γ .This effect could be corrected by shifting x . In the interest of keeping our model as close to SEPI as possible, wechose not to do so: this penalty is only significant in the regime where no learning occurs, and is otherwise outweighedby the performance increase due to self-tuned responsiveness, with the additional benefit of simplifying the discussionin the main text. Even if optimizing x could make the performance slightly closer to the optimal bound, this kindof fine-tuning seems irrelevant in a biological context. S3 Defining the systems responsiveness
In the main text we use a measure for the “responsiveness” of our system to changes in the demand D ( t ). In thissection it is shown in detail how this measure is defined. The central aim of the considered regulatory model is tomatch the time-dependent demand (cid:126)D with the regulated production (cid:126)P . The temporal evolution of (cid:126)P is given by:˙ P i = (cid:88) µ σ iµ ˙ a µ , (S9)with τ a ˙ a µ = a µ max (cid:32) d, (cid:88) i σ µi (1 − P i /D i ) (cid:33) − κa µ . (S10)For a static demand D i = 1 the production relaxes to a constant value P i ≈ κ ) andconsequently ˙ P i = 0. A deviation δD i from the static demand will lead to a change of the production P i - the larger˙ P i , the faster the response to the changed demand. Therefore, we define the responsiveness of the production P i tothe demand D j as R ij = d ˙ P i dD j . When assuming small fluctuations the explicit expression for the responsiveness isthen given by: d ˙ P i dD j = (cid:88) µ σ iµ d ˙ a µ dD j ≈ (cid:88) µ σ iµ a µ σ µj P i D j ≈ (cid:88) µ σ iµ a µ σ µj . (S11)As an example we consider the one-dimensional system studied in Fig. 3 A-F in the main text for which the respon-siveness is R = d ˙ PdD = σ a + σ +1 + σ − a − σ − = a + + a − , (S12)where we used that σ = 1 and σ − = − S4 Control theory calculation
The problem we set is minimizing (cid:104) ( (cid:126)P − (cid:126)D ) (cid:105) , where (cid:126)D follows the dynamics of Eq. (1) in the main text, (cid:126)D ( t + ∆ t ) = (cid:126)D ( t ) + M · (cid:16) (cid:126)D − (cid:126)D ( t ) (cid:17) ∆ t + √ t (cid:126)η. (S13)We then wish to calculate the optimal way of steering (cid:126)P . For simplicity, we will set the mean D = 0 (in thecontext of this abstract problem, this is equivalent to working with mean-shifted variables δ (cid:126)D ≡ (cid:126)D − (cid:126)D and similarly δ (cid:126)P ≡ (cid:126)P − (cid:126)D ). We can start by discretizing the above equation, D t +1 = D t − ˜ M D t + ξ t , (S14)where ˜ M ≡ M ∆ t , and ξ has variance 2Γ∆ t . We seek to determine the optimal way to steer P ; in other words, thefunction φ t ( P t , D t ) (“control policy”) dictating how P should be changed in a given time step: P t +1 = P t + u t (S15) u t = φ t ( P t , D t ) . (S16)2We then can define our cost function, which combines a cost for the magnitude of u t (how quickly we can change (cid:126)P ),and the difference between (cid:126)P and (cid:126)D :Cost = E (cid:32) ρ N − (cid:88) τ =0 (cid:107) u τ (cid:107) + N (cid:88) τ =0 (cid:107) P τ − D ) τ (cid:107) (cid:33) . (S17)The φ ( P t , D t ) describing the optimal behavior is the one that minimizes this cost. In order to solve for φ ( P t , D t ),we follow standard control theory techniques and define the “cost-to-go” function, V t ( p t , d t ) = min φ t ...φ N − E (cid:32) ρ N − (cid:88) τ = t (cid:107) u τ (cid:107) + N (cid:88) τ = t (cid:107) P τ − D τ (cid:107) (cid:33) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) P t = p t , D t = d t D τ +1 = ( − ˜ M ) D τ + ξ τ P τ +1 = P τ + u τ u τ = φ τ ( P τ , D τ ) . (S18)This function defines the smallest cost of all remaining steps; in particular, the total cost that we are trying tominimize is V (0 , V N ( p, d ) = (cid:107) p − d (cid:107) (S19)and the following recursive relation: V t ( p, d ) = ( p − d ) + min v (cid:110) ρ (cid:107) v (cid:107) + E ξ V t +1 (cid:0) p + v, (1 − ˜ M ) d + ξ (cid:1)(cid:111) . (S20)Since our system is Gaussian, this recursive relation can be solved by adopting a quadratic ansatz: V t ( p, d ) = p (cid:62) A t p − d (cid:62) B t p + d (cid:62) C t d + Q t , (S21)Solving for the matrices A t , B t , C t , and Q t , gives us the following recursive relations: Q t = Q t +1 + 2Γ∆ t tr C t +1 A t = ρ A t +1 ( ρ + A t +1 ) − B t = ρ ( − ˜ M ) B t +1 ( ρ + A t +1 ) − C t = − ˜ M ) (cid:2) C t +1 − B t +1 ( ρ + A t +1 ) − B (cid:62) t +1 (cid:3) ( − ˜ M ) (S22)Since our problem is to minimize the cost at steady state (known in control theory as an “infinite horizon” problem, N (cid:55)→ ∞ ), we are interested in the fixed point of this mapping, specifically the matrices A −∞ , B −∞ to which thismapping converges when we start from A N = B N = C N = Q N = 0 (as defined by Eq. (S19)).Since A N is the identity matrix, all A t are proportional to the identity matrix as well: A t = α t
1, where α t = 1 + ρα t +1 ρ + α t +1 . The fixed point of this mapping is α = √ ρ ≥
1. Similarly, the fixed point of B t is B = [ − ρρ + α ( − ˜ M )] − .Expressing this in terms of α only: B = α (cid:104) α −
1) ˜ M (cid:105) − With these expressions, the optimal “control policy” is defined by the value of v that minimizes Eq. S20. Thisdefines the optimal way to change (cid:126)P for a given observed (cid:126)D : u = 1 α (cid:16) [ α −
1) ˜ M ] − ( − ˜ M ) (cid:126)D − (cid:126)P (cid:17) , (S23)or, restoring the notations of the main text, including a non-zero D : u = 1 α (cid:16) [ α − M ∆ t ] − ( − M ∆ t )( (cid:126)D − (cid:126)D ) − (cid:126)P (cid:17) , (S24)3This u is the exact solution to the discrete version of the problem we considered. Since our simulations in this workuse a discrete timestep, this is the form we use. Nevertheless, it is instructive to consider the small-∆ t , large-CIPlimit such that ∆ t and ( α − t are both small compared to inverse eigenvalues of M . In this case we have, to firstorder in ∆ t : u = 1 α (cid:16) [ − αM ∆ t ]( (cid:126)D − (cid:126)D ) − (cid:126)P (cid:17) . This leads to the following, and very intuitive, form of the optimal control dynamics: (cid:126)D (cid:55)→ (cid:126)D − M ∆ t ( (cid:126)D − (cid:126)D ) + ξ,(cid:126)P (cid:55)→ (cid:126)P − M ∆ t ( (cid:126)D − (cid:126)D ) + 1 α (( (cid:126)D − (cid:126)D ) − (cid:126)P ) . (S25)In other words, at every step the change in (cid:126)P mirrors the average expected change in (cid:126)D , with an extra term seeking toreduce their deviation. Note also that setting α = 1 (infinite CIP) corresponds to steering (cid:126)P directly to the expectedvalue of (cid:126)D at the next timestep, as expected. S5 The system makes use of correlations in the input
Fig. 4B in the main text demonstrated that, as the fluctuating inputs become increasingly correlated, our archi-tecture is able to outperform SEPI by an increasingly wide margin. The natural interpretation of this result is thatthe system is able to learn and exploit this correlation. Technically, however, one might note that this observationalone does not yet prove that our architecture is able to appropriately use the information it learned about specifically correlation . For example, it could be that strongly correlated inputs are somehow inducing a stronger increase inreactivity, causing the system to be generally faster, but without benefiting specifically from the correlated nature ofits inputs.Rather than focusing on excluding this specific scenario (which could be done by comparing the CIP values alongthe curves shown in Fig. 4B), we will show that with a sufficient number of regulators, our architecture can performbetter than the theoretical ceiling achievable by any strategy that assumes the inputs to be independent. This willformally prove that, at least for some parameters, our system’s improved performance must necessarily make use ofthe correlation of its inputs. Although the argument is somewhat academic in nature (we will prove our point using N a = 25 regulators), it is theoretically pleasing, and so we present it here.Specifically, we consider a system subjected to inputs structured as shown in Fig. 3H, with angle α = π/ N a , plotted as curves parameterized by the degradation rate κ . The degradation rate controlshow large the a µ can become: a high value of κ leads to lower average steady-state values of the regulator activities,causing the system to be less responsive to changes in D . Thus, κ can be used to set the CIP of the regulatory system,allowing us to plot these performance curves in the “performance vs. CIP” axes traditional for control theory. ( P D )
1e 2 Best strategyBest independent strategy N a = 25 N a = 5 N a = 4 FIG. S1. The adapting system can perform better than the best independence-assuming strategy. M corresponding to uncorrelated inputs: M = (cid:0) λ λ (cid:1) . Each such M defines a familyof control strategies (that would have been optimal if this M were the true M governing the dynamics of the input);this family is indexed by a parameter ρ as described above. A system following an (independence-assuming) strategy M = (cid:0) λ λ (cid:1) while faced with the actual (partially correlated) inputs will exhibit a certain performance P ( λ , λ , ρ )and a certain CIP, which we denote CIP( λ , λ , ρ ). With these notations, the “best independent” curve is defined as P (CIP = χ ) = max λ ,λ {P ( λ , λ , ρ ) for ρ such that CIP( λ , λ , ρ ) = χ } We note that the correlated-input CIP is different from the independent-input CIP that a given strategy would haveincurred if faced by the input for which it is optimal. In particular, while the latter can be computed analytically, theformer has to be evaluated in simulations. This makes the optimization procedure computationally costly; thankfully,the symmetry ensured by choosing α = π/ M = (cid:0) λ λ (cid:1) , reducingthe problem dimensionality from three parameters { λ , λ , ρ } to more manageable two { λ, ρ } .The result is shown in Fig. S1 as a dashed line. As highlighted in the inset, with enough regulators, our architectureis indeed able to outperform the theoretical ceiling of the best independence-assuming strategy. Although N = 25regulators is of course a regime irrelevant for biological applications, the aim of this argument was to formally provea theoretical point, namely that the system as constructed must necessarily be making use of the correlation in theinput signal, at least for some values of the parameters; by construction, the “best independent” curve is a high barto clear. S6 Nonlinearity as a sensor of fluctuation variance
In the main text we argue that the nonlinearity in the dynamics of the regulator concentrations acts as a senorfor the variance of the fluctuations. To see this, we consider the dynamics of one regulator that is controlling theproduction of one metabolite: τ a ˙ a = a max ( d, − P/D ) − κa. (S26)To simplify notation we define γ := 1 − P/D . Since the dynamics of a are slow compared to D , the fluctuations of γ are on a faster timescale than the regulator dynamics. If the fluctuations of γ are small, the nonlinearity in the maxfunction is “not activated”: max ( d, γ ) = γ . In this case, the temporal average of max ( d, γ ) is zero. In contrast, ifthe fluctuations are strong enough, the nonlinearity is activated (see Fig. S2). Then, the temporal average is positive,leading to an additional growth of a . Due to the resulting larger values of a , the response of the system becomesfaster, making the match between P and D better and thus serving to decrease the variance of γ . As a result, thefinal average steady-state regulator concentration is reached if the system has decreased the variance of γ sufficientlyby increasing the rapidity of its response. d max( d , 𝛾 ) 𝛾 FIG. S2. The nonlinearity in the regulatory architecture. If the fluctuations of the input γ = (cid:80) i σ µi (1 − x i ) are large enough,the average over the nonlinearity is positive, causing additional growth of the regulator concentration a . This argument makes it clear why the specific choice of nonlinearity is not particularly important. If ˙ a = f ( x ),then the static steady state satisfies f ( x ) = 0. For a fast-fluctuating input this becomes˙ a = (cid:104) f ( x ) (cid:105) = f ( x ) + 12 (cid:104) ( x − x ) (cid:105) f (cid:48)(cid:48) ( x ) + . . . f , as long as f ( x ) = 0, the displacement of the original steady state will be determined by higherorder statistics of the input. In particular, the rectified-linear nonlinearity in our equations can be replaced, forexample, by a Hill function. The system will retain all the relevant behavior as long as the new nonlinearity is ∪ -convex in a sufficiently large vicinity of its static fixed point; see section “A pair of allosteric enzymes” for an explicitexample.The assumption f ( x ) = 0 is not innocuous; in general, of course, the value of (cid:104) f ( x ) (cid:105) is sensitive not only to thevariance of x (or other higher-order terms), but also to its mean, and building a system that is sensitive to specificallythe variance requires adapting to the mean first. In our model, this is automatically accomplished by the underlyingend-product inhibition architecture, which adapts the mean P to mean D = D , after which x fluctuates around 1, nomatter the value of D . S7 The minimal Γ needed to initiate adaptation Fig. 4A in the main text includes arrows indicating theoretically derived threshold values of Γ above which oursystem (with a given σ µi ) will begin to adapt its timescale of response, deviating from SEPI in its behavior. Here, weshow how this threshold value of Γ can be determined.As discussed in the main text, at static input (Γ = 0) only N x out of N a regulators can be active. Consider theregulators that remain inactive in the static case, and imagine gradually increasing the fluctuation magnitude. Recallthe equation for regulator activity dynamics: τ a ˙ a µ = a µ max (cid:32) d, (cid:88) i σ µi (1 − P i /D i ) (cid:33) − κa µ . (S27)After introducing γ µ = (cid:80) i σ µi (1 − P i /D i ) we can write τ a ˙ a µ = a µ (max ( d, γ µ ) − κ µ ) = a µ ∆ µ . (S28)If we chose a µ as one of the regulators that remained inactive in the static case, we have ∆ µ < µ will at first remain negative. The threshold Γ weseek is one where some ∆ µ crosses into positive values. It is clear that if the fluctuations of γ µ are so small thatmax( d, γ µ ) = γ µ at all times, the system does not adapt. On the other hand, if the fluctuations are large enough andfast compared to the response of the system, they generate an effective additional growth of a µ . To first order, thisadditional growth term is proportional to the standard deviation √ ω µ of γ µ . Therefore, we expect the fluctuationsto cause a growth of a µ if the additional growth term is large compared to κ , i.e. √ ω (cid:39) c · κ , with c a constant oforder 1.The approximate value of c can be determined using the following argument. With d = 0, and assuming that γ µ is, first, fluctuating on a fast timescale compared to τ a and, second, is Gaussian with mean γ µ and variance ω µ , wecan average over the fluctuations in Eq. (S28): (cid:104) ∆ µ (cid:105) = γ µ (cid:114) ω µ π exp (cid:32) − γ µ ω µ (cid:33) + γ µ (cid:32) γ µ (cid:112) ω µ (cid:33) − κ. (S29)The system is in a stable steady state if either (cid:104) ∆ µ (cid:105) = 0 and a µ ≥ (cid:104) ∆ µ (cid:105) < a µ = 0. In the non-trivial firstcase we get the condition γ µ ≤ κ . Approximating γ µ ≈ (cid:104) ∆ µ (cid:105) is positive if √ ω µ > √ πκ , so that c = √ π . If this condition is satisfied, a µ continues its growth until the separation of timescalesbetween γ µ and τ a becomes invalid and ω µ decreases; this is the mechanism by which the system adapts to fastfluctuations.The variance ω µ can be derived from the statistical properties of D . If the fluctuations of the demand D are smallit holds that ω µ ≈ δ ˆ D T (cid:126)σ µ δ ˆ D where δ ˆ D is the covariance matrix of the stationary probability distribution of thefluctuations δ (cid:126)D with (cid:104) δD (cid:105) = Γ (cid:16) cos αλ + sin αλ (cid:17) and (cid:104) δD δD (cid:105) = Γ cos α sin α (cid:16) λ − λ λ λ (cid:17) . The variance ω µ is thengiven by ω µ = (cid:126)σ Tµ δ ˆ D (cid:126)σ µ and the minimal value of Γ is set by the largest ω µ of the considered system. S8 Realistic biochemical implementations
In the main text we proposed a simple model of metabolic regulation which highlighted the necessary propertiesfor learning environment statistics, namely an excess of regulators a µ , self-activation, and a nonlinear regulation of6 a µ by the metabolite concentrations x i . To show how these properties can enable more realistic systems to learnthe statistics of a fluctuating environment, here we present two biochemical implementations. The first of theseimplements dynamics nearly identical to those described in the main text, and the second, illustrated in Fig. 5b,bears resemblance to two-component systems. As described in the main text, we do not necessarily expect either ofthese networks to be implemented in real biological systems “as is”. Instead, we use these to illustrate the diversityof systems that could use the logic described in this paper to learn statistics of their environment. For simplicity, weconsider the one-dimensional case (first column of Fig. 3 in the main text). S8.1 A pair of allosteric enzymes x xE + xE − E + E − E + E − ** E + * ** xE − * κ κκ κ x x FIG. S3. Implementation of the regulatory mechanism based on a pair of self-activating enzymes which can be in an active( E ∗ ) or inactive state ( E ). Gray shading indicates catalysts of reactions. Here, we discuss a realization of our regulatory logic which was not described in the text. This realization of thelogic instantiates very similar dynamics to those in the main text, with the nonlinearity taking a more realistic formof a Hill function derived from cooperative binding.This network is shown in Fig. S3. The enzymes E + and E − can be in an active or inactive state: The active formof E + , which we denote E ∗ + , catalyzes the production of x ; similarly, E ∗− catalyzes degradation of x . In addition, wepostulate that active enzymes can bind to molecules of the metabolite x , which controls self-catalytic activity (seediagram). The total concentration of E ∗ + , bound and unbound, plays the role of the activating regulator a + from themain text ( a + = [ E ∗ + ] + [ xE ∗ + ]), while E ∗− plays the role of the inhibitor a − ( a − = [ E ∗− ] + [ xE ∗− ]).The same regulatory structure could be realized with transcription factor regulation, with the role of the activeenzymes ( E + and E − ) played by two transcription factors. In this version, activation/deactivation of enzymes isreplaced by the simpler process of transcription factor synthesis/degradation. For concreteness, here, we focus on theenzymatic case, largely because we expect architectures like ours to be more relevant in fast-responding circuits, whichtend to be non-transcriptional. However, except for the difference of timescales, the dynamics of the two versionswould otherwise be identical; in particular, both implementations would “learn” in the same way.For this discussion, it is convenient to have timescales of dynamics of a µ and x i encoded as explicit parameters.Assuming first order kinetics, the dynamics of the network can then be described by: τ x ˙ x = γ + a + − γ − xa − − xD ( t ) ,τ a ˙ a + = a + c n + c n + + x n − a + κ + ,τ a ˙ a − = a − x m c m − + x m − a − κ − . (S30)Here, we assume that the metabolite x is much more abundant than the active enzymes E ∗ + and E ∗− , meaning thatthe relative amount of bound x is very small. This allows us to neglect, in the dynamics of x , the source and sinkterms due to binding and unbinding of x to the enzymes. We also assume that this binding and unbinding occurs ona much faster timescale than all other processes.Figure S4 shows an example of simulation results for these dynamics (for the full list of parameters used, see section“Parameters used in figures”). We see that the system reacts to an increasing variance of environmental fluctuations(A) by increasing regulator activities (B). The figure also shows the behavior of a SEPI system which only consistsof one a + regulator described by the dynamics in Eq. (S30). Fig. S4C shows that the response strength, defined asdiscussed in the SI section “Defining the system’s responsiveness,” R = d ˙ PdD ≈ γ + a + nc n + ( c n + + 1) + γ − a − mc m − ( c m − + 1) , (S31)7 v a r D a a + a SEPI2.5 5.0 7.5 t / a | x | AdaptingSEPI2.5 5.0 7.5 t / a R e s p o n s e s t r e n g t h FIG. S4. Adaptation of responsiveness to an increasing variance of environmental fluctuations. A: Step-wise increase of thevariance of D . B: Time-series of regulator concentrations, where a + and a − correspond to the total concentrations of t + and t − respectively. C: The responsiveness of the system as defined in Eq. (S31). D: The deviation of the metabolite concentration x from its target value. The panels show averages over 40 realizations. is increasing due to the changed regulator activities. Finally, Fig. S4D compares the performance of the systemEq. (S30) with the corresponding SEPI system (which, again, we define by the same equations as Eq. (S30), exceptwithout the a − regulator). Similar to Figure 3F in the main text, the performance of the adapting system does notchange as the variance of the stimulus increases, while the SEPI system becomes worse. S8.2 An architecture based on a bifunctional enzyme x nonlinear E κ x x E κ x FIG. S5. Regulation by allosteric forms of one enzyme E . The unbound form E activates the production of x , while the boundform xE promotes its degradation. The synthesis of E is regulated nonlinearly by the metabolite concentration x . In the main text, we presented a network (Fig. 5B) that, despite a lack of obvious similarity to our proposedarchitecture, contains the same key ingredients and is also able to learn. We described this network’s relationship totwo component systems in the main text; here, we will use it to produce an analog of Fig. 3C-F in the main text.For the reader’s convenience, we reproduce this circuit in Fig. S5 (identical to Fig. 5B in the main text). As describedin the main text, for greater generality, we will here rephrase the task: instead of matching production to demand, wewill think of maintaining the homeostasis of a quantity x perturbed by external factors. For example, instead of beinga metabolite concentration, x could be osmolarity mismatch, and our circuit a hypothetical architecture for activecontrol of osmotic pressure. In this interpretation, the enzyme E might be a mechanosensor triggered by tensionin the membrane or cell wall, while “production” and “degradation” of x could be activities of opposing pumps, or8 v a r D a a + aa SEPI+ a SEPI t / a | x | AdaptingSEPI2.5 5.0 7.5 t / a R e s p o n s e s t r e n g t h FIG. S6. A: The variance of D is increased step-wise. B: Change of regulator activities. The regulator activities a + and a − overlap strongly and cannot be distinguished in this panel. C: The response strength of the system. D: The mismatch of themetabolite concentration x from its target value. All results show an average over 6 realizations. regulators of glycerol export or metabolism.To simplify our language when describing the terms in the specific equations we simulate, we will still talk of ametabolite x being produced and degraded. However, to better accommodate alternative interpretations, the regulatoractivities will now be defined so that a + and a − would be equal on average (for example, the activities of pumps inopposite directions should, on average, balance). This homeostasis-maintaining formulation is in contrast to Fig. 3Din the main text, where regulators satisfied the constraint (cid:104) a + − a − (cid:105) = D = 1.The production and degradation of x are catalyzed by a bifunctional enzyme that changes its activity when boundto x forming the compound xE . The concentration of the unbound form E corresponds to the activating regulator, a + = [ E ], and increases the production P of x , while xE plays the role of the inhibiting regulator, a − = [ xE ], andpromotes the degradation of x .As above, we assume first order kinetics for the production and degradation of x , and that the binding kinetics arefast compared to the other timescales in the problem. Defining A = a + + a − = [ E ] + [ xE ] as the total concentrationof the enzyme E in both its bound and unbound states, the bound and unbound fractions are described by Hillequations: a + = A c m x m + c m , a − = A − a + . (S32)The dynamics of our system are then: (cid:40) τ x ˙ x = P + γ + a + − γ − xa − − xD ( t ) τ A ˙ A = − Aκ + f ( x ) , (S33)where we assumed that modifying the enzyme E does not significantly affect the quantity x . (In the metabolicformulation, this corresponds to the assumption that x is much more abundant than E , so that the sequestrationof x by E has negligible effect on the free concentration of x ; in the osmotic formulation, triggering mechanosensorshas negligible effect on pressure itself). In the second equation, the synthesis of the enzyme E depends nonlinearlyon the metabolite concentration x . The specific form of nonlinearity does not significantly affect the results, aslong as it is sufficiently ∪ -convex in the vicinity of the operating point: As described in section “Nonlinearity as asensor of fluctuation variance” we can think of the nonlinearity f ( x ) as a “sensor” for the variance of environmental9fluctuations. Whenever fluctuations in D ( t ) increase such that the current responsiveness of the system fails tomaintain the homeostasis of x within previous bounds of fluctuation magnitude, the fluctuations of x will lead togrowth of A , increasing the responsiveness until it is again able to reduce the fluctuations of x . In our simulations wechose a Hill function with cooperativity 4 (see section “Parameters used in figures”).Figure S6 shows simulation results for this system. As in the first column of figure 3, the variance of D is increasedand the response of the system to this change is monitored. We see that the regulator concentrations correspondinglyincrease, causing a larger response strength | d ˙ xdx | ≈ γEc m m (1+ c m ) . The increase in response strength is able to compensatefor most of the performance loss, which shows that the system successfully adapts its timescale of response. Thisis in contrast to the ‘SEPI-like’ system with a fixed value A = 1, which cannot adapt its responsiveness and whoseperformance drops with every increase in fluctuation variance. S9 Adaptation to static demand
In the main text we argue that the production (cid:126)P of the proposed system Eq. (3) adapts to any static demand (cid:126)D .The full dynamics of the system is τ a ˙ a µ = a µ max (cid:32) d, (cid:88) i σ µi (1 − P i /D i ) (cid:33) − κa µ . (S34)With a static demand, Eq. (S34) possesses the same fixed points as the simplified dynamics: τ a ˙ a µ = a µ (cid:32)(cid:88) i σ µi (1 − P i /D i ) − κ (cid:33) . (S35)These dynamics have a Lyapunov-function F ( { a µ } ) = − (cid:88) i D i ( P i − D i ) − κ (cid:88) µ a µ . (S36)This can be verified by considering the temporal change of FdFdt = (cid:88) µ ∂F∂a µ da µ dt = (cid:88) µ a µ ∆ µ > , (S37)with ∆ µ = (cid:80) i σ µi (1 − P i /D i ) − κ . Thus, F always increases and is obviously bound from above. For small κ , themaximum of F is reached for (cid:126)P ≈ (cid:126)D , showing that the system adapts to any static demand. S10 The role of self-activation
In the main text we argue that our proposed system requires three ingredients to adapt to fluctuating environments:excess of regulators with cross-talk, self-activation of regulators and nonlinear activation/repression of the regulatorsby the metabolite concentrations. Figure 5A shows that excess and nonlinear activation/repression are needed. Here,in Fig. S7 we show that also self-activation is necessary. The plot shows results for the same simulations as in Figure5A with the difference that this time we omit the prefactor a µ in front of the max function in the dynamics ofthe system. In Fig. 5A (with self-activation), for a sufficient number of regulators the performance becomes betterwith increasing d . In contrast, in Fig. S7 (without self-activation) it becomes worse, showing that the system is notadapting to the fluctuations.Note that for the SEPI-like case with N a = N x = 2, the activities of regulators remain approximately constant at a µ = 1; therefore, in this case, including or omitting the a µ prefactor results in identical performance. This is whywe included a “SEPI” label on Fig. 5A, even though all models included in that parameter sweep had self-activation(and our definition of SEPI does not).One may note that, unlike the primary architecture we considered in the main text (Fig. 2C), the circuit described inSection S8 S8.2 does not include any explicitly self-catalytic interaction. However, recalling the definitions of { a + , a − } ,we can write a + = ( a + + a − ) c m x m + c m and a − = ( a + + a − ) x m x m + c m . Thus, a small change in the concentration x leadsto a change in a + and a − which is proportional to the sum of a + and a − . In this way, our bifunctional-enzyme-basedcircuit also includes an element of self-activation.0 -10 -1 -0.08 -0.04 0 d S y s t e m ( P D ) SEPI × FIG. S7. Reproduction of Figure 5A in the main text without self-activation.
S11 Parameters used in figures
If not stated otherwise we use the following parameters: D = 1, Γ = 0 . d = 0, κ = 0 . τ a = 3, α = 45 ◦ , λ = 8 . λ = 875, dt = 1 /λ ≈ . · − . Since the demand D i is modelled by a stochastic process which is, inprinciple, unbound there is a non-zero probability that the demand D i becomes negative (which is very small for thechosen parameters). To prevent this behavior in the simulations we set D i = 0 .
01 if D i < . Figure 3 C-F :Fluctuations: In 1D the matrix M only has one element which we set to M = 7 .
5, Γ = [0 . , . , . , . κ = 0 . τ a = τ SEP Ia = 1.Average over 80 realizations.Figure 3F shows a running average over 5 steps.
Figure 3 I-L :Fluctuations: α = [ − , , − , N a = 5, τ a = 1 /λ , κ = 0 . N a = 5 adapting system (measured in an environment with an isotropic M with the same determinantas used in Fig. 3J-L): τ SEP Ia = τ a / . a µ = 0 . Figure 4 A :Fluctuations: Γ from 0 to 0.1 in 40 equidistant steps.Timescale SEPI: τ SEP Ia = τ a = 3.Simulation length: 5 · steps. Figure 4 B :Fluctuations: Γ = 0 . A λ and λ are chosen as: λ = A rA , λ = A λ with r = 1 / .
75 + 1 / τ SEP Ia = τ a = 3.1Simulation length: 5 · steps. Figure 5 A :Fluctuations: Results averaged over activation angles α = [45 , , , , κ = 0 . τ a = 1 /λ .Simulation length: 10 steps. Figure S1 :System: κ from 0 .
01 to 0 .
025 in steps of size 1 . · − .Simulation length= 5 · steps.For each simulation, the performance was averaged over the last 10 time steps. Figure S1 inset :All system parameters as in Figure S1 except for: κ from 0.013 to 0.014 in steps of size 2 . · − .Simulation length: 10 steps.For each simulation, the performance was averaged over the last 2 · steps.The results are binned in 20 equal-width bins. Figure S4 :The parameters in the simulation were chosen so as to ensure that, first, τ x (cid:28) τ a ; and second, the steady-state x staysat 1 (this is analogous to setting x = 1 in the main text). Specifically, we used: τ x = 0 . τ a = 1, γ + = γ − = 1, n = 2, m = 2, c n + = 0 . c m − = 2, κ + = 1 . c n + +1 , κ − = 1 . c m − c m − +1 . The parameters describing the fluctuations of D are chosen as: D = 1, M = 1, Γ = [0 . , . , . , . A brief explanation:
While the parameter list is long, there is a simple reasoning which sets most of these choices, andexplains how the parameters of this model need to be related to each other in order for the adaptation of responsiveness tooccur. First of all, we assume that the concentration x changes on a much faster timescale than the regulator concentrations a ; here we choose τ a = 1 and τ x = 0 .