Spatial partitioning improves the reliability of biochemical signaling
SSpatial partitioning improves the reliability of biochemical signaling
Andrew Mugler , Filipe Tostevin , and Pieter Rein ten Wolde FOM Institute AMOLF, Science Park 104, 1098 XG Amsterdam, The Netherlands
Abstract
Spatial heterogeneity is a hallmark of living systems, even at the molecular scale in individual cells. Akey example is the partitioning of membrane-bound proteins via lipid domain formation or cytoskeleton-induced corralling. Yet the impact of this spatial heterogeneity on biochemical signaling processes ispoorly understood. Here we demonstrate that partitioning improves the reliability of biochemical sig-naling. We exactly solve a stochastic model describing a ubiquitous motif in membrane signaling. Thesolution reveals that partitioning improves signaling reliability via two effects: it moderates the non-linearity of the switching response, and it reduces noise in the response by suppressing correlationsbetween molecules. An optimal partition size arises from a trade-off between minimizing the numberof proteins per partition to improve signaling reliability and ensuring sufficient proteins per partition tomaintain signal propagation. The predicted optimal partition size agrees quantitatively with experimen-tally observed systems. These results persist in spatial simulations with explicit diffusion barriers. Ourfindings suggest that molecular partitioning is not merely a consequence of the complexity of cellularsubstructures, but also plays an important functional role in cell signaling.
The cell membrane is a nexus of information processing. Once regarded as a simple barrier between acell and its surroundings, it is now clear that the membrane is a hotspot of molecular activity, wheresignals are integrated and modulated even before being relayed to the inside of the cell [1]. Moreover,the membrane itself is structurally complex. Regions enriched in glycosphingolipids, cholesterol, and othermembrane components, often called lipid rafts, transiently assemble and float within the surrounding bilayer[2], providing platforms for molecular interaction [3]. Additionally, interaction of the membrane with theunderlying actin cytoskeleton forms compartments in which molecules are transiently trapped [4, 5]. Thesemembrane sub-domains create a highly heterogeneous environment in which molecules are far from wellmixed, and it is currently unclear what effect this heterogeneity has on cell signaling.Membrane sub-domains are thought to play a dominant role in the observed aggregation of signalingmolecules into clusters [6]. Interestingly, these clusters have a characteristic size of only a few molecules.For example, the GPI-anchored receptor CD59 is observed to form clusters of three to nine molecules uponinteraction with the cytoskeleton and lipid rafts [7, 8]. Similarly, the well-studied membrane-bound GTPaseRas forms clusters of six to eight molecules which also depend on interactions with the cytoskeleton andrafts [9, 10]. Despite the important findings that aggregation of proteins induced by sub-domains can affectreaction kinetics [11], enhance oligomerization [1], modulate downstream responses [12, 13] and enhance sig-nal fidelity [13,14], the origin of this characteristic size remains unknown. While it is quite possible that thesedomains owe their size to a thermodynamic or structural origin, we here address the question of whetherthis size can be optimized for signaling performance. We find that the partitioning imposed by sub-domainsgives rise to a trade-off in cell signaling, from which an optimal size of a few molecules emerges naturally,suggesting that reliable signaling is intimately tied to the spatial structure of the membrane.We study via stochastic analysis and spatial simulation a model that is directly motivated by both CD59and Ras signaling at the membrane. Stimulated CD59 receptors induce the switching of several Src-family1 a r X i v : . [ q - b i o . M N ] F e b utside the cellcytoplasm A BC well mixed partitionedvs X X ∗ α Y Y ∗ γβ m γ Figure 1: Schematic depiction of the model system. A We consider a model representative of signal detectionby receptors and signal transmission at the cell membrane. B The model consists of two molecular species( X and Y ) which can each exist in active ( X ∗ , Y ∗ ) or inactive ( X , Y ) states. Molecules in the X state areactivated by the external signal of strength α , and active X ∗ molecules subsequently activate Y molecules. C We consider these reactions taking place in a single domain with all components well mixed, or in a domainconsisting of smaller compartments which are each individually well mixed but between which no interactionis possible. The total system volumes in the two scenarios are equal and assumed to scale with the numberof X molecules.kinases from an unphosphorylated to a phosphorylated state [7, 8]. Similarly, stimulated EGF receptorsinduce the switching of Ras proteins from an inactive GDP-loaded state to an active GTP-loaded state [13].We therefore study the simple and ubiquitous motif of coupled switching reactions, in which the activationof one species (the receptor) triggers the activation of a second species (the downstream effector).We exactly solve this stochastic model of coupled switching reactions, and we use the solution to comparesignaling reliability in a spatially-partitioned system to that in a well-mixed system. We demonstrate thatpartitioning can improve signaling performance by generating a more graded input-output relation and byreducing the noise in the signaling response. This latter effect comes about because partitioning reduces thecorrelations between the states of the different output molecules. On the other hand, the stochastic exchangeof proteins between partitions can generate configurations which isolate molecules and exclude them fromthe signaling process, thereby reducing the dynamic range of the response and increasing the output noise.The trade-off between these two effects results in an optimal partition size that agrees well with cluster sizesof signaling proteins that are observed experimentally [7–10], suggesting that cluster sizes are tuned so as tomaximize information transmission. We model two coupled molecular species at the membrane, as depicted in Fig. 1A. A membrane-boundreceptor (e.g. CD59 or EGF receptor) is activated via ligand stimulation, and the active receptor in turnactivates a membrane-bound effector (e.g. a Src-family kinase or Ras). A reaction scheme representing theseprocesses is shown in Fig. 1B, and consists of two protein species: the receptor X and the downstream effector Y . The switching of X molecules from the X to the X ∗ state is driven by an external signal of strength α .Active X ∗ molecules act on inactive Y molecules and promote switching to the Y ∗ state. Deactivation ofboth active protein species occurs spontaneously and independently.We will be concerned with how the network response, the number of active Y ∗ molecules as a function ofthe input signal α , is affected by the spatial structure of the system. In particular we ask how partitioning2f the reaction system into non-interacting sub-domains affects the reliability of signal transmission, whichis determined by two principal factors: the input-output response and the output noise; together theseproperties determine to what extent different input signals can be reliably resolved from the network response.We focus on two system configurations, shown in Fig. 1C. In the first case we assume that all molecules arepresent in a single well-mixed reaction compartment. In the second case, we consider a system partitionedinto π compartments between which no interactions are permitted; here we take the output of the systemto be the total number of active Y ∗ molecules in all compartments. This choice of output corresponds to areadout of the Y ∗ signal by, e.g., a cytosolic component whose diffusion is much faster than the diffusion andsignaling of X and Y on the membrane. In the partitioned system, we will for simplicity first assume thatthe molecules are uniformly and statically distributed among compartments. However, recognizing that thisscenario will not generally be realized inside cells, we will later relax this assumption and consider exchangeof molecules among partitions.We model the dynamics of the well-mixed system, as well as each compartment within the partitioned system,using a stochastic equation of the same form. We denote the total numbers of X and Y molecules by M and N , respectively, and the numbers of active X ∗ and Y ∗ molecules by m and n , respectively. To parameterizethe system, we scale units of time by the deactivation rate of X ∗ , such that the effective deactivation rate is1. Then α denotes the rescaled activation rate of X ; γ is the rate of deactivation of Y ∗ relative to that of X ∗ ;and γβ m is the activation rate of a given Y molecule for a particular concentration of X ∗ molecules. Theparameter α incorporates the effective strength of the input signal and determines the mean X ∗ activity viathe occupancy q ≡ (cid:104) m (cid:105) /M = α/ ( α + 1). The precise m -dependence of the coupling function β m will dependon the exact nature of the interactions between X ∗ and Y molecules. We take β m ∝ m/v , with v the volumeof the compartment in which the reactions are taking place. However, our conclusions are unaffected if weinstead take a Michaelis-Menten form β m ∝ m/ ( m + vK ) (Appendix C: Fig. 7). The total system volume V is assumed to scale with the total number of X molecules, such that M/V is constant. The coupling functionin partition i ∈ { , . . . , π } is then determined by m i , the number of X ∗ molecules in partition i , accordingto β ( i ) m ∝ m i / ( V /π ) = βπm i /M for constant β .The probability of having m proteins in the X ∗ state and n proteins in the Y ∗ state evolves according tothe chemical master equation (CME),˙ p mn = − [ L m ( α, M ) + γ L n ( β m , N )] p mn , (1)subject to suitable boundary conditions. The nature of the particular set of reactions in our model (Fig. 1B)means that the operators L m and L n have the same form, L m ( α, M ) = α (cid:2) − E − m (cid:3) ( M − m ) + (cid:2) − E +1 m (cid:3) m, (2)where E im f ( m ) = f ( m + i ) defines the step operator. Despite the appearance of terms containing theproduct mn in the operator L n ( β m , N ), which make the direct calculation of moments of p mn from theCME impossible, an exact solution to (1) can be found for arbitrary β m using the method of spectralexpansion [15, 16] as described in Appendix A.1. We begin by analyzing the behavior of a minimal system with M = N = 2. In the well-mixed system, allmolecules are contained within π = 1 domain of volume V . In the partitioned system, π = 2 subdomainswith volume V / X and one Y molecule.We first focus on the mean response (cid:104) n (cid:105) . In the limits of small or large α the mean response is the same inboth the partitioned and mixed systems, (cid:104) n (cid:105) /N → (cid:104) n (cid:105) /N → β/ ( β + 1) respectively. However, at allintermediate values of α , the mean response of the well-mixed system is larger than that of the partitioned3 ell mixed perfectly partitioned partitioned with exchange output mean,
332 bits I = 0 .
463 bits I = 0 .
213 bits mean input, q =
N > π , and compare thewell-mixed system to a system with uniform partitioning of the X and Y molecules into the π compartments.In the well-mixed case all Y molecules respond to the same signal m ( t ), and hence are correlated with allother Y molecules in the system. By contrast, in the partitioned case the N/π > Y molecules within eachpartition are correlated, and indeed since the fluctuations in m i ( t ) will be larger than m ( t ) for the mixedsystem, such correlations will be stronger; yet, the Y molecules in different partitions are uncorrelated. Thislatter effect is sufficient to overcome the increase in correlations within each partition, such that the totalnoise is reduced. 5o see the noise reduction explicitly, we again consider the expression for the variance. Since the dynamicsof different partitions is independent, assuming that both M and N are multiples of π , the variance can bewritten as σ n N = (cid:104) n (cid:105) N (cid:18) − (cid:104) n (cid:105) N (cid:19) + π ∆( ˜ M , ˜ N ) N , (5)where ˜ M ≡ M/π and ˜ N ≡ N/π are the numbers of X and Y molecules per compartment, respectively.Here, as before, ∆( ˜ M , ˜ N ) represents the additional fluctuations due to correlations between the states of Y molecules within each compartment. The N -dependence of ∆( ˜ M , ˜ N ), which reflects the number of correlatedpairs of Y molecules, can be straightforwardly factored out as ∆( ˜ M , ˜ N ) = ˜ N ( ˜ N −
1) ˜∆( ˜ M ), where ˜∆( ˜ M )describes how strongly correlated are Y molecules within each compartment. The exact form for ˜∆( ˜ M ),while straightforward to calculate for a given ˜ M , is difficult to generalize for all ˜ M ; nonetheless, inspectionof numerical and analytic results for specific combinations of ˜ M and ˜ N reveals in all cases that increasing π leads to an overall reduction in σ n . Additionally, if the switching of Y molecules is much slower than thatof X molecules, γ (cid:28)
1, then ˜∆( ˜ M ) takes the form˜∆( ˜ M ) ≈ αβ γ ˜ M (1 + α + αβ ) (6)Inserting this expression into (5) with ˜ M = M/π and ˜ N = N/π , one can straightforwardly see that thevariance is a decreasing function of π for π < N , indicating that the noise is reduced as the system is morefinely partitioned. We have seen that partitioning has two beneficial effects on signal propagation: the input-output responsebecomes more graded, and the output noise at a given response level is reduced. Together, these effectsmean that a larger number of distinct input signals can be encoded in the network response. To quantifythe ability of the network to transmit signals we calculate the mutual information I [ α, m ] [18] between theinput and the number of active Y ∗ molecules, as described in Appendix A.2. We find that indeed, in thecase of M = N = 2 (Fig. 2), I [ α, m ] is significantly larger for the partitioned system ( I = 0 .
463 bits) thanfor the well-mixed system ( I = 0 .
332 bits), confirming that signal transmission is dramatically improved bypartitioning.
Thus far we have considered only the perfectly uniform and stationary partitioning of molecules. In reality,physical transport processes such as diffusion will also give rise to a variety of configurations with differentnumbers of proteins in each compartment, as depicted in Fig. 4. Each of these configurations will havedifferent properties for the transmission of the signal from α to n . It is therefore important to considerwhether the benefits of partitioning described above persist once these additional configurations are takeninto account.Single-molecule tracking experiments have revealed that the timescale of diffusive mixing within a com-partment ( ∼ µ s) is two orders of magnitude faster than the timescale of molecular exchange betweencompartments ( ∼
10 ms) [19]. This observation allows us to treat each configuration as static on the timescaleof mixing, then compute the total response by averaging over all configurations. Inherent in this treatmentis the assumption that the timescale of signaling is also faster than that of exchange between compartments.We later relax this assumption using spatially resolved simulations and nonetheless find similar results.6 information (bits) ! ! ! ! ! Figure 4: Exchange between partitions leads to different configurations of the system with a range of signalingperformance. Multiplicities listed above each configuration are due to symmetry. Parameters are as in Fig.2.The total response is computed by first enumerating the possible configurations of M X molecules and N Y molecules distributed amongst π partitions. For each such configuration c we then solve for theoutput distribution p n | c and combine these distributions, weighted by the probability p c of each configurationoccurring if molecules are randomly assigned to different partitions with uniform and independent probability,to give the overall response distribution p n = (cid:80) c p n | c p c .Figure 2B (dot-dashed curve) shows that the exchange of molecules between compartments increases thenoise relative to the perfectly partitioned system considered previously when M = N = 2. This is becausemany of the alternative configurations generated by exchange lead to significant correlations between thestates of the different Y molecules. Nevertheless, we see that the noise remains lower than that of the well-mixed system, because of the existence of some configurations in which the Y molecules are independent.However, the appearance of alternate configurations also affects the mean response (Fig. 2A); in particular,the appearance of configurations in which X and Y molecules do not occupy the same partitions, and henceno signal can be propagated, means that the maximal output level is reduced. Given this simultaneous changein both the input-output function and the noise, it is not immediately clear whether signaling reliability isimproved relative to the well-mixed system. Computing the mutual information, we see that the informationtransmitted by the system with exchange ( I = 0 .
213 bits) is significantly lower than that for the well-mixedsystem ( I = 0 .
332 bits), showing that the reduction of the output range compromises signal transmission toan extent which cannot be overcome by the corresponding reduction in noise.The decrease in information transmission upon incorporating molecule exchange in the system with M = N = 2 is the result of the appearance of suboptimal protein configurations, for which signal propagationis compromised (or even impossible). However, the number and performance of such configurations will ingeneral depend on the relative values of M , N and π (which need not equal M or N ). While moleculeexchange may make partitioning unfavorable in the extreme case of M = N = 2, for systems with higherprotein numbers it can be beneficial to partition the system into π > To study the performance of systems with higher protein numbers and different partition sizes, we comparethe information transmission, including molecule exchange, for different partition numbers π as the numberof proteins in the system is varied while holding M = N . Figure 5A shows that for M = N > π > M = N isincreased the optimal partition number also increases such that the optimal number of proteins per partition M/π ∗ = N/π ∗ ≈ β and γ : changing7 particle number, M = N i n f o r m a t i on ( b i t s ) A (cid:47) = 1 (cid:47) = 2 (cid:47) = 3 (cid:47) = 4 0 1 2 3 4 5 6 7 8 9 10012345 particle number, M = N op t i m a l pa r t i t i on nu m be r , (cid:47) * M/ (cid:47) * = N/ (cid:47) * (cid:53) B Figure 5: An optimal partition size. A For M = N > π > π = 1). B As M = N is increased the optimalpartition number also increases such that the optimal number of proteins per partition M/π ∗ = N/π ∗ ≈ M/π ∗ = N/π ∗ ∼ − M = N is also not crucial for this result. In fact, we findthat the value of M/π ∗ has only a weak dependence on N (Appendix C: Fig. 10).The optimal partition size arises from a trade-off between the reliability and efficiency of signaling. Increasingthe number of partitions decreases the typical number of proteins per partition, which leads to the beneficialeffects of a more graded response and reduced noise, increasing signaling reliability. On the other hand, dueto molecule exchange, reducing the number of molecules per partition also increases the probability that anypartition contains proteins of only one species that are therefore excluded from the signaling process, whichleads to a reduced maximal response, reducing signaling efficiency.The optimal size revealed by our study of ∼ −
10 molecules per species per partition shows good quantitativeagreement with the observed aggregation of CD59 receptors (3 − − Lastly, we confirm that the effects observed in these minimal model systems, where the contents of eachcompartment are well-mixed and exchange can occur between any pair of compartments, persist in a morerealistic model in which the diffusion of molecules in space is included explicitly. We simulate the diffusionand reaction of X and Y molecules on a two-dimensional lattice, as described in Appendix A.3. The systemis partitioned into a number of subdomains by the introduction of diffusion barriers, which are crossed witha reduced probability p hop relative to regular diffusion steps on the lattice. Results of such simulations areshown in Fig. 6.Figure 6A and B reveal that as the strength of the diffusion barriers is increased, the mean response becomesmore graded, and the variance of Y ∗ activity is reduced, analogous to the two effects observed in theminimal model system (Fig. 2). When p hop = 0, one molecule of each species is permanently confined to a8ompartment, producing the graded response predicted for the perfectly partitioned system (Fig. 6A) andthe associated minimal, binomial noise (Fig. 6B). Low but finite p hop allows exchange of molecules betweenneighboring compartments but preserves a separation of timescales between intra- and inter-compartmentmixing. This results in a graded mean response whose maximal level is reduced (Fig. 6A) and reducednoise (Fig. 6B), precisely the features observed in the minimal model of partitioning with exchange (Fig. 2).When p hop = 1, there are no barriers, and the system approaches the well-mixed limit (CME). Interestingly,however, the response remains more graded and the noise remains lower than the predictions of the CMEdue to the finite speed of diffusion (Fig. 6A and B), with agreement only reached when the ratio of diffusionto reaction propensities is much greater than one. This observation reveals that finite diffusion imposesan effective partitioning even when no actual partitions exist: molecules remain correlated with reactionpartners within a typical distance set by diffusion, but uncorrelated with partners beyond this distance.As such, in the context of coupled reversible modification, we find that slower diffusion can linearize theresponse and reduce the noise, thereby improving information transmission. It is important to emphasize,however, that the extent of this effect is much smaller than for actual partitioning: Fig. 6B shows that finitediffusion reduces the maximal noise by (1 . − / .
25 = 20%, while strong partitioning ( p hop = 0 . . − . / . ≈ β and γ , spanning the rangeof ∼ −
10 molecules per partition (Appendix C: Fig. 9C and D). Fig. 6C also provides a measure of thescale of information transmitted by this motif. In absolute terms, the optimal information (1 .
35 bits) isconsistent with values recently measured for signaling via the TNF-NF- κ B pathway ( ∼ . − . Drosophila embryo (1 . ± .
15 bits) [23]. In relative terms, we see that partitioningincreases information over the unpartitioned system by (1 . − . / . ≈
30% (Fig. 6C) and decreasesthe maximal noise by (1 − . / We have seen that the partitioning of a biochemical signaling system into a number of non-interacting subsys-tems improves the reliability of signaling via two effects. First, the non-linear response of the network meansthat a reduction in the number of input molecules translates into a more graded input-output response.Second, partitioning significantly reduces the noise in the response by eliminating correlations between thestates of the different output molecules, an effect which, remarkably, overcomes the increase in noise associ-ated with fewer input molecules in each subsystem. On the other hand, we have seen that the introductionof diffusion or exchange of molecules between partitions enhances the variance and reduces the range of theresponse, thereby reducing signaling performance. This result is due to the presence of configurations in Interestingly, this result is in marked contrast to the case of boundary establishment in embryonic development, wherefaster diffusion reduces noise within each nucleus by washing out bursts of gene expression in the input signal [21]. Whilein the present system faster diffusion will similarly reduce any super-Poissonian component of the noise within each partitionindividually, this averaging does not reduce the noise in the total output across all partitions. In fact, the latter noise isenhanced with faster diffusion by virtue of increased correlations between partitions. mean input, q =
SIAppendix is freely available at http://partitioning.sourceforge.net . This work is part of the research program of the “Stichting voor Fundamenteel Onderzoek der Materie(FOM)”, which is financially supported by the “Nederlandse organisatie voor Wetenschappelijk Onderzoek(NWO)”. We thank Philippe Nghe for a critical reading of the manuscript.
A Detailed methods
A.1 Spectral solution of the master equation
The chemical master equation (CME) is solved using the method of spectral expansion [15, 16], describedin detail in Appendix B. Briefly, the structure of of the CME, in which the dynamics can be separatedinto two operators that act only on m or n but not both, allows for its solution to be written in the form p mn ( t ) = (cid:80) Mj =0 (cid:80) Nk =0 G jk ( t ; ¯ β ) φ jm ( α ) φ kn ( ¯ β ), where φ jm ( α ) is the j th eigenvector of the operator L m ( α ) andsimilarly for φ kn ( ¯ β ), and ¯ β is an expansion parameter on which p mn does not ultimately depend. Theexpansion coefficients G jk ( t ; ¯ β ) can be calculated straightforwardly, as shown in Appendix B. Importantly,this spectral expansion dramatically decreases the computational complexity of calculating p mn : rather thansolving the ( M + 1)( N + 1) × ( M + 1)( N + 1) system of the original CME, it is only necessary to solve N linear systems of size ( M + 1) × ( M + 1) for the vectors of coefficients (cid:126)G k . We emphasize that sincethe system has a finite state-space, no approximations are made in using the spectral expansion, and thesolution remains exact. Furthermore, the moments of the steady-state distribution p mn can be convenientlyexpressed in terms of the expansion coefficients G jk ; in particular, (cid:104) n (cid:105) = G and (cid:104) n (cid:105) = 2 G + G . A.2 Mutual information
The mutual information between network input and response is given by the standard expression [18] I [ α, m ] = (cid:104) log { p ( α, n ) / [ p ( α ) p ( n )] }(cid:105) , where the average is taken over the joint distribution p ( α, n ) = p ( n | α ) p ( α ),and p ( n | α ) = (cid:80) Mm =0 p ( m, n | α ) is given by the steady state of the CME. The calculation of the mutual in-formation requires specification of the distribution of input signals p ( α ). We choose N α values of α suchthat q = α/ ( α + 1) = (cid:104) m (cid:105) /M is uniformly-spaced over the range 0 ≤ q ≤
1; then p ( n ) = (cid:80) N α i =1 p ( n | α i ) p ( α i )11nd p ( α i ) = N − α . However, our conclusions are unaffected if we instead take a input distribution that isunimodal or bimodal (Fig. 13). We take N α >
30, for which I [ α, m ] converges to within 1% of its large- N α limit (Fig. 14). A.3 Spatial simulations
The diffusion and reactions of M X molecules and N Y molecules are simulated on a two-dimensional squarelattice of side length λ using a fixed-time-step integration scheme. During each step of duration δt , eachparticle is moved to a random neighboring lattice site with probability p D = ( D/(cid:96) ) δt , where D is thediffusion constant, and (cid:96) is the lattice spacing. Molecules have steric interactions on the lattice, such thatonly one molecule can be present at each lattice site at any time. Attempted moves to an occupied site arerejected, with the particle remaining at its original position. If a molecule in the X ∗ state is adjacent to amolecule in the Y state, the latter is converted to the Y ∗ state with probability p r = γ ( βλ /M ) δt . To make π partitions, linear diffusion barriers are placed at iλ/ √ π in each direction, where i ∈ { , , . . . , √ π − } . Adiffusion step which crosses such a barrier is accepted with probability reduced by a factor p hop . The timestep δt is chosen sufficiently small that no probability exceeds one. B Solution of the master equation by spectral expansion
This section describes the solution via the method of spectral expansion, or the ‘spectral method’, of the CMEintroduced in the main text. The spectral method has been used fruitfully in the context of gene regulationto solve CMEs describing cascades [15], bursts [16], and oscillations [31], and a pedagogical treatment isavailable in [32]. Here we apply the spectral method to coupled reversible switching.From Eqns. 1-2 of the main text, the stochastic dynamics of the system under study are given by the CME˙ p mn = − [ L m ( α, M ) + γ L n ( β m , N )] p mn , (7)where both operators L m and L n have the form L m ( α, M ) = α (cid:2) − E − m (cid:3) ( M − m ) + (cid:2) − E +1 m (cid:3) m, (8)with E im f ( m ) = f ( m + i ) defining the step operator. The CME describes the evolution of the probability ofhaving m X proteins in the active state and n Y proteins in the active state, with β m the coupling functionby which X drives the activation of Y . B.1 The moments do not close
We first demonstrate that direct computation of the moments from the CME is not possible because themoments do not close. The reason is that a nonlinearity is present in the first term of Eqn. 8 in the form ofthe factor β m n . As a result, the first moment depends on a higher moment, which in turn depends on aneven higher moment, and so on.To see explicitly that the moments do not close, we consider computing the dynamics of the first moment ofthe driven species, the mean (cid:104) n (cid:105) , by summing the CME over m and n against n . We obtain1 γ ∂ t (cid:104) n (cid:105) = −(cid:104) n (cid:105) + N (cid:104) β m (cid:105) − (cid:104) β m n (cid:105) , (9)12here averages are taken over p mn . We see that indeed the final term carries the nonlinearity. Even for thesimplest coupling function, i.e. linear coupling β m = cm , one finds a hierarchy of moment dependencies thatdoes not close: ∂ t (cid:104) n (cid:105) = − γ (cid:104) n (cid:105) + γcN (cid:104) m (cid:105) − γc (cid:104) mn (cid:105) , (10) ∂ t (cid:104) mn (cid:105) = αM (cid:104) n (cid:105) − ( α + γ + 1) (cid:104) mn (cid:105) + γcN (cid:104) m (cid:105) − γc (cid:104) m n (cid:105) , (11) ∂ t (cid:104) m n (cid:105) = . . . (12)That is, the dynamics of (cid:104) n (cid:105) depend on (cid:104) mn (cid:105) , whose dynamics depend on (cid:104) m n (cid:105) , and so on.The fact that the moments cannot be computed—indeed, not even the mean output (cid:104) n (cid:105) —makes it particu-larly important to actually solve the CME in order to learn about the statistical properties of this system. B.2 The spectrum of the switch operator
The CME is a linear equation. Even when the rates are nonlinear functions of the molecule numbers, theCME is still linear in its degree of freedom, the joint probability. The most straightforward way to solve alinear equation is to write its solution as an expansion in the eigenfunctions of the linear operator. Although itis difficult to derive the eigenfunctions of the coupled operator L m ( α, M ) + γ L n ( β m , N ), it is straightforwardto derive the eigenfunctions of the uncoupled operator L m ( α, M ), which we call the switch operator. Indeed,we will see that expanding the joint probability in eigenfunctions of the uncoupled operator greatly simplifiesthe form of the CME, yielding an exact solution in terms of matrix algebra.The switch operator governs the CME for the first species X ; explicitly,˙ p m = −L p m = α [ M − ( m − p m − + ( m + 1) p m +1 − [ α ( M − m ) + m ] p m , (13)where for notational simplicity we have taken L m ( α, M ) → L . Its eigenvalue relation is written L φ jm = λ j φ jm , (14)for eigenvalues λ j and eigenvectors φ jm . B.2.1 Eigenvalues
The matrix form of the operator L can be read directly from Eqn. 13: L = M α − − M α ( M − α + 1 − − ( M − α ( M − α + 2 −
3. . . . . . . . . − α α + ( M − − ( M − − α α + ( M − − M − α M . (15)The tridiagonal structure follows from the fact that molecule numbers only increase or decrease by oneat a time. Practically speaking, the eigenvalues can be obtained using the fact that the determinant of atridiagonal matrix can be computed recursively. Performing the computation for M = 0 , , , . . . reveals thepattern λ j = ( α + 1) j, j ∈ { , , , . . . , M } . (16)13owever, Eqn. 16 can be derived more rigorously by making use of a generating function. We presentthis derivation next, since the generating function formalism will also prove quite useful in deriving theeigenvectors and solving the CME.The generating function is an expansion in any complete basis for which the probability distribution providesthe expansion coefficients [33]. Choosing as our basis the set of polynomials in some continuous variable x ,the generating function is defined G ( x ) = M (cid:88) m =0 p m x m . (17)The probability distribution is recovered via the inverse transform p m = 1 m ! ∂ mx [ G ( x )] x =0 . (18)A key utility of the generating function is turning the CME, which is a set of ordinary differential equations(ODEs), into a single partial differential equation. Indeed, summing Eqn. 13 against x m yields˙ G = − ( x − αx + 1) ∂ x − αM ] G, (19)where the appearances of x and ∂ x arise from the shifts m − m + 1, respectively. Eqn. 19 directly givesthe form of the operator in x space: L = ( x − αx + 1) ∂ x − αM ]. The eigenfunctions are then obtainedfrom the relation L φ j ( x ) = λ j φ j ( x ) by separating variables and integrating: φ j ( x ) = ( α + 1) − M ( x − λ j / ( α +1) ( αx + 1) M − λ j / ( α +1) . (20)Here, the constant factor ( α + 1) − M is determined by application of the normalization condition G (1) = 1to the steady state solution, which is obtained by setting λ j = 0: G ( x ) = (cid:18) αx + 1 α + 1 (cid:19) M . (21)We will solve Eqn. 19 in two ways: by the method of characteristics and by expansion in the eigenfunctions;together these solutions will reveal the eigenvalues.First, the method of characteristics [34] posits that the dependence of G on x and t occurs via some parametricvariable s , i.e. G ( x, t ) = G [ x ( s ) , t ( s )]. The chain rule then gives dG/ds = ( ∂G/∂x )( dx/ds ) + ( ∂G/∂t )( dt/ds ),which when compared term by term with Eqn. 19 yields three ordinary differential equations: dtds = 1 , dxds = ( x − αx + 1) , dGds = αM ( x − . (22)The first identifies s = t , with which the second is solved by z = x − αx + 1 e − ( α +1) t , (23)where z is a constant of integration. The crux of the method is that Eqn. 23 defines a characteristic curveon which G must depend, i.e. G ( x, t ) = f [ z ( x, t )] g ( x, t ), where f and g are unknown functions, and z hasbeen promoted to a characteristic function of x and t . The function g is identified by realizing that steadystate is reached as t → ∞ , for which f ( z ) → f (0) no longer depends on x or t . Therefore, g must be thesteady state function given in Eqn. 21: G ( x, t ) = (cid:18) αx + 1 α + 1 (cid:19) M f ( z ) . (24)Although we still do not know f , we may Taylor expand it around the point z = 0, yielding G ( x, t ) = (cid:18) αx + 1 α + 1 (cid:19) M ∞ (cid:88) j =0 c j z j = (cid:18) αx + 1 α + 1 (cid:19) M ∞ (cid:88) j =0 c j (cid:18) x − αx + 1 (cid:19) j e − ( α +1) jt , (25)14here c j ≡ ∂ jz [ f ( z )] z =0 /j !.Second, because Eqn. 19 is linear, we may also write down its solution as an expansion in the eigenfunctionsof its linear operator: G ( x, t ) = (cid:88) j C j ( t ) φ j ( x ) . (26)Under the assumption that the eigenfunctions are orthogonal (which will be shown in the next section),inserting Eqn. 26 into Eqn. 19 yields an independent ODE for each C j , ˙ C j = − λ j C j , which is solved by C j ( t ) = c j e − λ j t for initial conditions c j . Inserting this functional form and that for φ j ( t ) (Eqn. 20) into Eqn.26 yields G ( x, t ) = (cid:18) αx + 1 α + 1 (cid:19) M (cid:88) j c j (cid:18) x − αx + 1 (cid:19) λ j / ( α +1) e − λ j t . (27)Comparison of Eqns. 25 and 27 reveals both the expression for the eigenvalues, λ j = ( α + 1) j , and a limiton their domain, the nonnegative integers j ∈ { , , , . . . , ∞} . Of course, the domain can be a subset ofthe nonnegative integers; then some c j in Eqn. 27 would be zero. Indeed, since L is a finite matrix of size M + 1 by M + 1 (Eqn. 15), it is spanned by M + 1 linearly independent eigenvectors, meaning we expect only M + 1 eigenvalues. In fact, the only set of M + 1 nonnegative integers that satisfies the requirement thatthe trace of L , (cid:80) Mm =0 [( M − m ) α + m ] = ( α + 1) M ( M + 1) /
2, equals the sum of the eigenvalues, (cid:80) j ( α + 1) j ,is j ∈ { , , , . . . , M } . Thus, we arrive at the result λ j = ( α + 1) j, j ∈ { , , , . . . , M } , (28)as proposed by inspection in Eqn. 16. B.2.2 State space notation
The linear algebraic manipulations we have done thus far can be cast in the more abstract notation of statespaces, commonly used in quantum mechanics [35]. We will find this notation useful in later sections, forexample in transforming between the molecule number basis and the eigenbasis. Specifically, we introducea state | p (cid:105) that can be projected into (cid:104) m | space to give the probability distribution, or into (cid:104) x | space to givethe generating function: (cid:104) m | p (cid:105) = p m , (cid:104) x | p (cid:105) = G ( x ) . (29)In the same way, the j th eigenstate | j (cid:105) is projected into (cid:104) m | space to give the j th eigenvector, or into (cid:104) x | space to give the j th eigenfunction: (cid:104) m | j (cid:105) = φ jm , (cid:104) x | j (cid:105) = φ j ( x ) . (30)This notation offers new insight into our definition of the generating function. For example, Eqn. 17 cannow be written (cid:104) x | p (cid:105) = M (cid:88) m =0 (cid:104) x | m (cid:105)(cid:104) m | p (cid:105) , (31)where we have recognized (cid:104) x | m (cid:105) = x m (32)as the projection of the state | m (cid:105) into (cid:104) x | space. Eqn. 31 has a clear interpretation: we have inserted acomplete set of | m (cid:105) states. Similarly, Eqn. 18 can now be written (cid:104) m | p (cid:105) = (cid:73) ¯ dx G ( x ) x m +1 = (cid:73) ¯ dx (cid:104) m | x (cid:105)(cid:104) x | p (cid:105) . (33)15n the first step, we have rewritten Eqn. 18 using Cauchy’s theorem, where ¯ dx ≡ dx/ πi , and the contoursurrounds the pole at x = 0. In the second step, we have recognized (cid:104) m | x (cid:105) = 1 x m +1 (34)as the conjugate to (cid:104) x | m (cid:105) . Eqn. 33 has the clear interpretation of inserting a complete set of | x (cid:105) states, underan inner product defined by the complex integration. The choice of inner product and of conjugate stateare made such that orthonormality is preserved, a fact which we may confirm by again employing Cauchy’stheorem: (cid:104) m | m (cid:48) (cid:105) = (cid:73) ¯ dx (cid:104) m | x (cid:105)(cid:104) x | m (cid:48) (cid:105) = (cid:73) ¯ dx x m (cid:48) x m +1 = 1 m ! ∂ mx (cid:104) x m (cid:48) (cid:105) x =0 θ ( m >
0) = δ mm (cid:48) . (35)Finally, the dynamics in Eqn. 19 can be written in state space as | ˙ p (cid:105) = − ˆ L| p (cid:105) = − (ˆ a + − α ˆ a + + 1)ˆ a − − αM ] | p (cid:105) , (36)where we have defined the operators ˆ a + and ˆ a − whose projections in x space are (cid:104) x | ˆ a + = x and (cid:104) x | ˆ a − = ∂ x .These are analogous to the raising and lowering operators in the well known treatment of the quantumharmonic oscillator. This operator formalism for the generating function was first developed in the 1970s;for a review see [36]. B.2.3 Eigenvectors
The state space notation facilitates a derivation of the functional form of the eigenvectors: φ jm = (cid:104) m | j (cid:105) = (cid:73) ¯ dx (cid:104) m | x (cid:105)(cid:104) x | j (cid:105) = (cid:73) ¯ dx x m +1 ( x − j ( αx + 1) M − j ( α + 1) M . (37)Here we have inserted the eigenfunctions φ j ( x ) = (cid:104) x | j (cid:105) = ( x − j ( αx + 1) M − j ( α + 1) M (38)from Eqn. 20, with eigenvalues given by Eqn. 28. We use Cauchy’s theorem to perform the integration andrecognize that derivatives of a product follow a binomial expansion: φ jm = 1( α + 1) M m ! ∂ mx (cid:2) ( αx + 1) M − j ( x − j (cid:3) x =0 (39)= 1( α + 1) M m ! m (cid:88) (cid:96) =0 (cid:18) m(cid:96) (cid:19) ∂ (cid:96)x (cid:2) ( αx + 1) M − j (cid:3) x =0 ∂ m − (cid:96)x (cid:2) ( x − j (cid:3) x =0 (40)= 1( α + 1) M m ! m (cid:88) (cid:96) =0 m !( m − (cid:96) )! (cid:96) ! (cid:20) ( M − j )! α (cid:96) ( M − j − (cid:96) )! θ ( (cid:96) ≤ M − j ) (cid:21) (cid:20) j !( − j − m + (cid:96) ( j − m + (cid:96) )! θ ( m − (cid:96) ≤ j ) (cid:21) (41)= ( − j − m ( α + 1) M (cid:88) (cid:96) ∈ Ω (cid:18) M − j(cid:96) (cid:19)(cid:18) jm − (cid:96) (cid:19) ( − α ) (cid:96) . (42)Here the domain Ω results from the derivatives and is defined by max(0 , m − j ) ≤ (cid:96) ≤ min( m, M − j ). Eqn.42 gives the expression for the eigenvectors. For j = 0 the expression reduces to the binomial distributionin terms of the occupancy q = α/ ( α + 1), as it must, since this is the steady state of the uncoupled process: φ m = (cid:18) Mm (cid:19) α m ( α + 1) M = (cid:18) Mm (cid:19) q m (1 − q ) M − m . (43)This function has one maximum, and in general the j th eigenvector has j +1 extrema, making the eigenvectorsqualitatively similar to Fourier modes or eigenfunctions of the quantum harmonic oscillator.16he switch operator ˆ L is not Hermitian. A consequence is that its conjugate eigenvectors ψ jm = (cid:104) j | m (cid:105) (row vectors) are not complex conjugates of its eigenvectors φ jm = (cid:104) m | j (cid:105) (column vectors). Rather, theyare distinct functions that must be constructed to obey an orthonormality relation in order to constitute acomplete basis. The orthonormality relation can be used to derive their form in x space, ψ j ( x ) = (cid:104) j | x (cid:105) : δ jj (cid:48) = (cid:104) j | j (cid:48) (cid:105) = (cid:73) ¯ dx (cid:104) j | x (cid:105)(cid:104) x | j (cid:48) (cid:105) = (cid:73) ¯ dx ψ j ( x ) ( x − j (cid:48) ( αx + 1) M − j (cid:48) ( α + 1) M = (cid:73) ¯ dz z j (cid:48) f j ( z ) . (44)Here we have defined z ≡ ( x − / ( αx + 1) and f j ( z ) ≡ ψ j ( x )( αx + 1) M +2 / ( α + 1) M +1 in order to draw anequivalence between Eqn. 44 and Eqn. 35, which then implies f j ( z ) = 1 /z j +10 = ( αx + 1) j +1 / ( x − j +1 , or ψ j ( x ) = ( α + 1) M +1 ( αx + 1) M − j +1 ( x − j +1 . (45)Eqn. 45 gives the form of the conjugate eigenfunctions in x space, which can be used to derive the expressionfor the conjugate eigenvectors as in Eqns. 37-42: ψ jm = (cid:104) j | m (cid:105) = (cid:73) ¯ dx (cid:104) j | x (cid:105)(cid:104) x | m (cid:105) = (cid:73) ¯ dx ( α + 1) M +1 ( αx + 1) M − j +1 ( x − j +1 x m (46)= (cid:88) (cid:96) ∈ Ω (cid:18) M − j + (cid:96)(cid:96) (cid:19)(cid:18) mj − (cid:96) (cid:19) ( − α ) (cid:96) ( α + 1) j − (cid:96) . (47)Here Ω is defined by max(0 , j − m ) ≤ (cid:96) ≤ j . Eqn. 47 gives the expression for the conjugate eigenvectors.They are j th order polynomials in m . B.3 Expanding the coupled problem in uncoupled eigenfunctions
We now solve the CME by expanding the solution in the eigenfunctions of the uncoupled operator. Thisprocedure is most easily done in state space, in which the CME reads | ˙ p (cid:105) = − [ ˆ L x ( α ) + γ ˆ L xy ] | p (cid:105) (48)where ˆ L x ( α ) = (ˆ a + x − α ˆ a + x + 1)ˆ a − x − αM ] , (49)ˆ L xy = (ˆ a + y − β x ˆ a + y + 1)ˆ a − y − ˆ β x N ] , (50)as in Eqn. 36, and we have introduced the operator ˆ β x whose action on the state | m (cid:105) yields the couplingfunction, ˆ β x | m (cid:105) = β m | m (cid:105) . The first step is to write the full operator as two uncoupled operators plusa correction term. Introducing the constant ¯ β to parameterize the second uncoupled operator, the CMEbecomes | ˙ p (cid:105) = − [ ˆ L x ( α ) + γ ˆ L y ( ¯ β ) + γ ˆΓ x ˆ∆ y ] | p (cid:105) (51)where we have explicitly denoted the fact that the correction term ˆ L xy − ˆ L y ( ¯ β ) factorizes into two operatorsthat act on each of the x and y sectors alone:ˆΓ x ≡ ˆ β x − ¯ β, (52)ˆ∆ y ≡ (ˆ a + y − a + y ˆ a − y − N ) . (53)The second step is to expand the solution in the eigenfunctions of the two uncoupled operators. Introducing k as the mode index for the eigenstates of ˆ L y ( ¯ β ), we write | p (cid:105) = M (cid:88) j =0 N (cid:88) k =0 G jk | j, k (cid:105) . (54)17nserting this form into the CME, projecting with the conjugate state (cid:104) j (cid:48) , k (cid:48) | , and summing over j and k yields the dynamics for the expansion coefficients G jk :˙ G jk = − [( α + 1) j + γ ( ¯ β + 1) k ] G jk − γ M (cid:88) j (cid:48) =0 Γ jj (cid:48) N (cid:88) k (cid:48) =0 ∆ kk (cid:48) G j (cid:48) k (cid:48) . (55)Here the first term is diagonal and reflects the actions of the uncoupled operators on their eigenstates. Thesecond term contains the corrections Γ jj (cid:48) = (cid:104) j | ˆΓ x | j (cid:48) (cid:105) and ∆ kk (cid:48) = (cid:104) k | ˆ∆ y | k (cid:48) (cid:105) . The first correction is directlyevaluated by inserting a complete set of m states:Γ jj (cid:48) = M (cid:88) m =0 (cid:104) j | ( ˆ β x − ¯ β ) | m (cid:105)(cid:104) m | j (cid:48) (cid:105) = M (cid:88) m =0 (cid:104) j | m (cid:105) ( β m − ¯ β ) (cid:104) m | j (cid:48) (cid:105) (56)= M (cid:88) m =0 ψ jm ( β m − ¯ β ) φ j (cid:48) m . (57)We see that Γ jj (cid:48) is the simply the difference between the coupling function and the constant parameter,rotated into eigenspace. Notably, for linear coupling, Γ jj (cid:48) is tridiagonal (see Sec. B.5). The second correctionis most easily evaluated by inserting a complete set of y states; the result, derived in Sec. B.5, is∆ kk (cid:48) = kδ kk (cid:48) − ( N − k + 1) δ k − ,k (cid:48) . (58)We see that ∆ kk (cid:48) is subdiagonal in k , which simplifies the dynamics of G jk to˙ G jk = − M (cid:88) j =0 Λ kjj (cid:48) G j (cid:48) k + γ ( N − k + 1) M (cid:88) j =0 Γ jj (cid:48) G j (cid:48) ,k − , (59)where we define the matrix acting on the diagonal part asΛ kjj (cid:48) ≡ [( α + 1) j + γ ( ¯ β + 1) k ] δ jj (cid:48) + γk Γ jj (cid:48) . (60)The subdiagonality allows one to write the steady state of Eqn. 59 as an iterative scheme, by which the k thcolumn of G jk is computed from the ( k − (cid:126)G k = γ ( N − k + 1) Λ − k Γ (cid:126)G k − . (61)The scheme is initialized with (cid:126)G = δ j (62)(see Sec. B.5), and the joint distribution is recovered via p mn = M (cid:88) j =0 N (cid:88) k =0 G jk φ jm φ kn , (63)which is the projection of Eqn. 54 into (cid:104) m, n | space.Eqn. 63 constitutes an exact steady state solution to the CME, with G jk computed iteratively via Eqns.61 and 62, auxiliary matrices defined in Eqns. 57 and 60, and the eigenvectors given by Eqns. 42 and 47.Importantly, the computational complexity of the solution has been dramatically reduced: rather than solvingthe original CME (Eqn. 7), which requires inverting its operator of size ( M + 1)( N + 1) × ( M + 1)( N + 1),Eqn. 61 makes clear that it is only necessary to invert N smaller matrices of size ( M + 1) × ( M + 1), i.e. thematrices Λ k for k ∈ { , , . . . , N } . 18 .4 Exact expressions for moments Now that we have an exact solution to the CME in terms of a spectral expansion, moments take an exactform in terms of the expansion coefficients. We thus circumvent the problem of moment closure, insteadarriving at compact expressions that require only the inversion and multiplication of finite matrices via Eqn.61.Moments are most easily computed from the generating function, G ( x, y ). For example, the ν th moment ofthe output is (cid:104) n ν (cid:105) = [( y∂ y ) ν G ( x = 1 , y )] y =1 . (64)In terms of the expansion, the generating function is G ( x, y ) = (cid:104) x, y | p (cid:105) = (cid:80) Mj =0 (cid:80) Nk =0 G jk (cid:104) x | j (cid:105)(cid:104) y | k (cid:105) , andusing the fact that (cid:104) x = 1 | j (cid:105) = δ j (Eqn. 38), we have (cid:104) n ν (cid:105) = N (cid:88) k =0 G k [( y∂ y ) ν (cid:104) y | k (cid:105) ] y =1 . (65)Inserting the expression for (cid:104) y | k (cid:105) (Eqn. 38) and defining w ≡ log y , we obtain (cid:104) n ν (cid:105) = N (cid:88) k =0 G k ∂ νw (cid:20) ( e w − k ( ¯ βe w + 1) N − k ( ¯ β + 1) N (cid:21) w =0 . (66)At this point we recall that ¯ β is a constant we introduce to parameterize the expansion. The expressionfor the moments therefore cannot depend on ¯ β : if we change ¯ β , the expression in brackets changes, but theexpansion coefficients G k also change, such that Eqn. 66 evaluates to the same ¯ β -independent form. Weare therefore free to set ¯ β to any value, and the choice ¯ β = 0 makes the derivative easiest to evaluate. Thuswe have (cid:104) n ν (cid:105) = N (cid:88) k =0 G k ∂ νw (cid:2) ( e w − k (cid:3) w =0 , (67)where it is now understood that G k is computed with ¯ β = 0. Evaluating the derivative yields (cid:104) n ν (cid:105) = N (cid:88) k =0 G k min( k,ν ) (cid:88) (cid:96) =1 (cid:26) ν(cid:96) (cid:27) k !( k − (cid:96) )! e (cid:96)w ( e w − k − (cid:96) w =0 (68)= N (cid:88) k =0 G k min( k,ν ) (cid:88) (cid:96) =1 (cid:26) ν(cid:96) (cid:27) k !( k − (cid:96) )! δ k(cid:96) (69)= min( ν,N ) (cid:88) k =1 G k (cid:26) νk (cid:27) k ! (70)in terms of the Stirling numbers of the second kind, (cid:26) νk (cid:27) = 1 k ! k (cid:88) (cid:96) =0 ( − k − (cid:96) (cid:18) k(cid:96) (cid:19) (cid:96) ν . (71)For example, the first moment, second moment, and variance are (cid:104) n (cid:105) = G , (72) (cid:104) n (cid:105) = G + 2 G , (73) σ n = (cid:104) n (cid:105) − (cid:104) n (cid:105) = G + 2 G − G . (74)19hese are exact expressions for the moments in terms of the expansion coefficients G k , which are obtainedby matrix inversion and multiplication via Eqn. 61, e.g. in Mathematica.An informative special case is immediately revealed when N = 1, for which G does not exist, i.e. (cid:104) n (cid:105) = G and σ n = G − G , or σ n = (cid:104) n (cid:105) (1 − (cid:104) n (cid:105) ) ( N = 1) . (75)Here there is only one output molecule. The relationship between its mean activation and the associatednoise must therefore obey the known result for a single binary switch, Eqn. 75. B.5 Auxiliary calculations
Here we show that Γ jj (cid:48) is tridiagonal for linear β m = cm :Γ jj (cid:48) = (cid:104) j | ˆΓ x | j (cid:48) (cid:105) (76)= (cid:104) j | ( c ˆ a + x ˆ a − x − ¯ β ) | j (cid:48) (cid:105) (77)= − ¯ βδ jj (cid:48) + c (cid:73) ¯ dx (cid:104) j | x (cid:105)(cid:104) x | ˆ a + x ˆ a − x | j (cid:48) (cid:105) (78)= − ¯ βδ jj (cid:48) + c (cid:73) ¯ dx (cid:104) j | x (cid:105) x∂ x (cid:104) x | j (cid:48) (cid:105) (79)= − ¯ βδ jj (cid:48) + c (cid:73) ¯ dx (cid:104) j | x (cid:105) x∂ x ( x − j (cid:48) ( αx + 1) M − j (cid:48) ( α + 1) M (80)= − ¯ βδ jj (cid:48) + c (cid:73) ¯ dx (cid:104) j | x (cid:105) x ( α + 1) M (cid:104) j (cid:48) ( x − j (cid:48) − ( αx + 1) M − j (cid:48) +( x − j (cid:48) ( M − j (cid:48) )( αx + 1) M − j (cid:48) − α (cid:105) (81)= − ¯ βδ jj (cid:48) + c (cid:73) ¯ dx (cid:104) j | x (cid:105) x ( x − j (cid:48) − ( αx + 1) M − j (cid:48) − ( α + 1) M [ j (cid:48) ( αx + 1) + ( x − M − j (cid:48) ) α ] (82)= − ¯ βδ jj (cid:48) + cα + 1 (cid:73) ¯ dx (cid:104) j | x (cid:105) x ( x − j (cid:48) − ( αx + 1) M − j (cid:48) − ( α + 1) M (cid:8) j (cid:48) ( αx + 1) +[ α ( M − j (cid:48) ) + j (cid:48) ]( αx + 1)( x − α ( M − j (cid:48) )( x − (cid:9) (83)= − ¯ βδ jj (cid:48) + cα + 1 (cid:73) ¯ dx (cid:104) j | x (cid:105) (cid:40) ( x − j (cid:48) − ( αx + 1) M − j (cid:48) +1 ( α + 1) M [ j (cid:48) ]+ ( x − j (cid:48) ( αx + 1) M − j (cid:48) ( α + 1) M [ α ( M − j (cid:48) ) + j (cid:48) ]+ ( x − j (cid:48) − ( αx + 1) M − j (cid:48) +1 ( α + 1) M [ α ( M − j (cid:48) )] (cid:41) (84)= − ¯ βδ jj (cid:48) + cα + 1 (cid:73) ¯ dx (cid:104) j | x (cid:105) {(cid:104) x | j (cid:48) − (cid:105) j (cid:48) + (cid:104) x | j (cid:48) (cid:105) [ α ( M − j (cid:48) ) + j (cid:48) ] + (cid:104) x | j (cid:48) + 1 (cid:105) α ( M − j (cid:48) ) } (85)= − ¯ βδ jj (cid:48) + cα + 1 {(cid:104) j | j (cid:48) − (cid:105) j (cid:48) + (cid:104) j | j (cid:48) (cid:105) [ α ( M − j (cid:48) ) + j (cid:48) ] + (cid:104) j | j (cid:48) + 1 (cid:105) α ( M − j (cid:48) ) } (86)= cj (cid:48) α + 1 δ j,j (cid:48) − + (cid:26) c [ α ( M − j (cid:48) ) + j (cid:48) ] α + 1 − ¯ β (cid:27) δ jj (cid:48) + cα ( M − j (cid:48) ) α + 1 δ j,j (cid:48) +1 . (87)Eqn. 77 recognizes that ˆ β x = c ˆ a + x ˆ a − x is the operator representation of β m (since ˆ a + ˆ a − is the number operator,i.e. ˆ a + x ˆ a − x | m (cid:105) = m | m (cid:105) ), and Eqn. 83 uses the algebraic fact that x [ j (cid:48) ( αx + 1) + ( x − M − j (cid:48) ) α ]( α + 1) = j (cid:48) ( αx + 1) + [ α ( M − j (cid:48) ) + j (cid:48) ]( αx + 1)( x −
1) + α ( M − j (cid:48) )( x − , which is straightforward to verify.20ere we derive Eqn. 58:∆ kk (cid:48) = (cid:104) k | ˆ∆ y | k (cid:48) (cid:105) (88)= (cid:104) k | (ˆ a + y − a + y ˆ a − y − N ) | k (cid:48) (cid:105) (89)= (cid:73) ¯ dy (cid:104) k | y (cid:105)(cid:104) y | (ˆ a + y − a + y ˆ a − y − N ) | k (cid:48) (cid:105) (90)= (cid:73) ¯ dy (cid:104) k | y (cid:105) ( y − y∂ y − N ) (cid:104) y | k (cid:48) (cid:105) (91)= (cid:73) ¯ dy (cid:104) k | y (cid:105) ( y − y∂ y − N ) ( y − k (cid:48) ( ¯ βy + 1) N − k (cid:48) ( ¯ β + 1) N (92)= (cid:73) ¯ dy (cid:104) k | y (cid:105) ( y − β + 1) N (cid:104) yk (cid:48) ( y − k (cid:48) − ( ¯ βy + 1) N − k (cid:48) + y ( y − k (cid:48) ( N − k (cid:48) )( ¯ βy + 1) N − k (cid:48) − ¯ β − N ( y − k (cid:48) ( ¯ βy + 1) N − k (cid:48) (cid:105) (93)= (cid:73) ¯ dy (cid:104) k | y (cid:105) ( y − k (cid:48) ( ¯ βy + 1) N − k (cid:48) − ( ¯ β + 1) N (cid:2) yk (cid:48) ( ¯ βy + 1) + y ( y − N − k (cid:48) ) ¯ β − N ( y − βy + 1) (cid:3) (94)= (cid:73) ¯ dy (cid:104) k | y (cid:105) ( y − k (cid:48) ( ¯ βy + 1) N − k (cid:48) − ( ¯ β + 1) N (cid:2) k (cid:48) ( ¯ βy + 1) − ( y − N − k (cid:48) ) (cid:3) (95)= (cid:73) ¯ dy (cid:104) k | y (cid:105) (cid:34) k (cid:48) ( y − k (cid:48) ( ¯ βy + 1) N − k (cid:48) ( ¯ β + 1) N − ( N − k (cid:48) ) ( y − k (cid:48) +1 ( ¯ βy + 1) N − ( k (cid:48) +1) ( ¯ β + 1) N (cid:35) (96)= (cid:73) ¯ dy (cid:104) k | y (cid:105) [ k (cid:48) (cid:104) y | k (cid:48) (cid:105) − ( N − k (cid:48) ) (cid:104) y | k (cid:48) + 1 (cid:105) ] (97)= k (cid:48) (cid:104) k | k (cid:48) (cid:105) − ( N − k (cid:48) ) (cid:104) k | k (cid:48) + 1 (cid:105) (98)= k (cid:48) δ kk (cid:48) − ( N − k (cid:48) ) δ k,k (cid:48) +1 (99)= kδ kk (cid:48) − ( N − k + 1) δ k − ,k (cid:48) . (100)Here we derive Eqn. 62: (cid:126)G = G j (101)= (cid:104) j, k = 0 | p (cid:105) (102)= M (cid:88) m =0 N (cid:88) n =0 (cid:104) j | m (cid:105)(cid:104) k = 0 | n (cid:105)(cid:104) m, n | p (cid:105) (103)= M (cid:88) m =0 N (cid:88) n =0 (cid:104) j | m (cid:105) p mn (104)= M (cid:88) m =0 (cid:104) j | m (cid:105) p m (105)= M (cid:88) m =0 (cid:104) j | m (cid:105)(cid:104) m | j = 0 (cid:105) (106)= (cid:104) j | j = 0 (cid:105) (107)= δ j . (108)Eqn. 104 uses Eqn. 47 to obtain (cid:104) k = 0 | n (cid:105) = 1, and Eqn. 106 recognizes that p m is the steady state of theuncoupled operator, p m = φ m = (cid:104) m | j = 0 (cid:105) . 21 Supplementary figures well mixed perfectly partitioned partitioned with exchange output mean,
437 bits I = 0 .
207 bits I = 0 .
318 bits mean input, q =
2, and γ = 1. A, B
As in Fig. 2 of the main text, with M = N = 2, perfect partitioning linearizes the input-output relationand reduces the noise, transmitting more information than the well-mixed system; further, allowing exchangeamong partitions compresses the response and increases the noise compared to the perfectly partitionedsystem, transmitting less information than the well-mixed system. C, D
As in Fig. 5 of the main text, an information-optimal partition size, here
M/π ∗ = N/π ∗ ≈
2, emergesdue to the trade-off between optimizing signaling reliability and avoiding unfavorable configurations.22 input, q = (cid:95) /( (cid:95) +1) =
Figure 8: Reducing the number of input molecules linearizes the input-output response and increases thenoise in the output. Here π = 1, β = 20, and γ = 1. A The output (the mean activity of N = 2 Y molecules) vs. the input (the mean activity of M X molecules)for several values of M . As M is reduced the response becomes more linear, deviating more strongly fromthe mean-field response (cid:104) n (cid:105) /N = βq/ ( βq + 1). Symbols show 20 uniformly spaced values of q to highlightthe effect of saturation on the state space. B The noise vs. the mean for the output, shown for the same values of M . As M is reduced the noiseincreases for all values of the mean. 23 β = 3: M/π ∗ ≈ β = 8: M/π ∗ ≈ β = 200: M/π ∗ ≈ particle number, M = N op t i m a l pa r t i t i on nu m be r , (cid:47) * A γ = 0 . M/π ∗ ≈ γ = 0 . M/π ∗ ≈ γ = 10 . M/π ∗ ≈ particle number, M = N op t i m a l pa r t i t i on nu m be r , (cid:47) * B partition size, M/ (cid:47) = N/ (cid:47) i n f o r m a t i on ( b i t s ) C (cid:96) = 20 (cid:96) = 8 (cid:96) = 4 10 partition size, M/ (cid:47) = N/ (cid:47) i n f o r m a t i on ( b i t s ) D (cid:97) = 1 (cid:97) = 0.2 (cid:97) = 0.05 Figure 9: The emergence of an optimal partition size is robust to parameter variations.
A, B
Results from the minimal system, described by the chemical master equation, as in Fig. 5B of the maintext. The information-optimal partition number π ∗ is plotted as a function of molecule number M = N for various values of β (A) and γ (B). Linear fits provide estimates of the optimal partition size M/π ∗ , asindicated in the legends. In A, γ = 1; in B, β = 20. C, D
Results from the lattice simulation, in which space is accounted for explicitly, as in Fig. 6C of themain text. The information is plotted as a function of the partition size, directly revealing an optimum, forvarious values of β (C) and γ (D). Parameters are as in Fig. 6C: M = N = 49, p hop = 0 . λ = 70, and p D /p r = 1. In C, γ = 1; in D, β = 20.As discussed in the main text, the optimum arises due to a tradeoff between two key effects of partitioning:on the one hand, partitioning removes correlations in the states of Y molecules, reducing noise; on the otherhand, partitioning isolates molecules, reducing the maximal response. The first effect favors few moleculesper partition, while the second effect favors many molecules per partition.As seen here in both the minimal system (A, B) and the simulated system (C, D), lowering β or γ increasesthe optimal number of molecules per partition. This result has an intuitive explanation in terms of the abovetradeoff: lowering either β or γ slows the rate of switching from the Y to the Y ∗ state, with respect to thetimescale of X switching. As a result, Y molecules are less sensitive to individual fluctuations in the stateof X molecules. The states of the Y molecules therefore exhibit weaker correlations, which in turn weakensthe benefit that partitioning imparts in terms of the removal of these correlations. The opposing effect ofmolecular isolation thus begins to dominate, pushing the optimum toward a larger number of molecules perpartition. 24 number of X molecules, M nu m be r o f Y m o l e c u l e s , N (cid:47) * Figure 10: The optimal partition size has only a weak dependence on the number of output molecules. Theinformation-optimal partition number π ∗ is plotted as a function of the number of X molecules M and thenumber of Y molecules N . The dependence of π ∗ on N is weak, such that the partition size M/π ∗ ≈ − N values. 25
100 200 300 400 500012345 time i npu t, (cid:95) A probability density (cid:239) partition size, M/ (cid:47) = N/ (cid:47) i n f o r m a t i on ( b i t s ) B c = 0c = 0.1c = 1c = 10 Figure 11: The effects of partitioning are robust to extrinsic noise. A Simulations are performed with extrinsic noise introduced to the input parameter α . To keep α ≥
0, thequantity z ≡ log α is described by the simple mean-reverting Ornstein-Uhlenbeck process dz = r ( µ − z ) dt + η √ rdtξ , where ξ is a Gaussian random variable with mean 0 and variance 1; this results in a log-normaldistribution for α . The quantity 1 /r is the autocorrelation time, and the choices µ = log[¯ α / (¯ α + c )] / η = (cid:112) c/ ¯ α ) ensure that the mean of α is ¯ α and that the variance of α scales with the mean via σ α = c ¯ α . B As the magnitude of the extrinsic noise (set by c ) increases, the information I [¯ α, n ] decreases for allpartition sizes, while the presence of an information-optimal partition size persists.Here M = N = 25, β = 20, γ = 1, p hop = 0 . λ = 50 lattice spacings squared, the ratio ofdiffusion to reaction propensities is p D /p r = 1, and r = 1 in units of the X ∗ → X reaction rate (which setsthe timescale of switching). In A, ¯ α = c = 1 and time is scaled by 1 /r . In B, when c = 0, the informationtransmission is lower than that in Fig. 6C of the main text because M = N is lower.26 input, q = (cid:95) /( (cid:95) +1) m ean ou t pu t, < n > / N A mean output,
As in Fig. 6A of the main text, as the probability of crossing a diffusion barrier p hop is decreased,the maximal value of the mean response decreases. In A, the response also becomes more linear, butto less of a degree than in Fig. 6A of the main text. Note that due to both finite diffusion and finitemolecule number, even the unpartitioned response ( p hop = 1) deviates from the mean-field response (blacksolid line), which is given by (cid:104) n (cid:105) /N = βf / (1 + βf ), where f is the fraction of X molecules in the dimerstate; in A, f = (cid:15)g with g ≡ (cid:104) m (cid:105) /M = ( (cid:112) (cid:15)q − / (4 (cid:15)q ), while in B, f = (cid:15)g (1 − g ) / (2 (cid:15)g + 1) with g ≡ (cid:104) m (cid:105) /M = [ (cid:112) (cid:15)q (1 − q ) − / [4 (cid:15) (1 − q )]. Here π = 25. Legends in middle panels apply to left panelsas well. Middle
As in Fig. 6B of the main text, as the probability of crossing a diffusion barrier p hop is decreased,the output noise decreases. Black dashed line shows the binomial noise limit σ n /N = ( (cid:104) n (cid:105) /N )(1 − (cid:104) n (cid:105) /N ).In B, lines connecting data points are provided to reveal that, as there are two values of q that give the samemean (cid:104) n (cid:105) /N (left), the noise is higher for the smaller value of q . Here π = 25. Right
As in Fig. 6C of the main text, the tradeoff between reliable signaling (reduced noise) and efficientsignaling (maintaining a high maximal response) leads to an information-optimal partition size. Here p hop =0 . M = N islower and additionally, in B, because of the non-monotonic mean response.27 input, q =
10 20 30 40 50 60 70 80 90 10010 (cid:239) (cid:239) (cid:239) (cid:239) N (cid:95) r e l a t i v e e rr o r well mixed, M = N = 2perfectly partitioned, M = N = (cid:47) = 2partitioned with exchange, M = N = (cid:47) = 2well mixed, M = N = 10perfectly partitioned, M = N = (cid:47) = 10 Figure 14: Computation of the mutual information converges as the input is more finely discretized. Therelative error | I − I | /I , where I is the information at N α = 100, is plotted against the number N α ofvalues of α [uniformly spaced in q = α/ ( α + 1)] used in the computation. Five conditions are tested, asindicated in the legend. It is seen that the relative error falls below ∼
1% in all conditions for N α (cid:38) eferences [1] Hern´an E Grecco, Malte Schmick, and Philippe I H Bastiaens. Signaling from the living plasma mem-brane. Cell , 144(6):897–909, Mar 2011.[2] Christian Eggeling, Christian Ringemann, Rebecca Medda, G¨unter Schwarzmann, Konrad Sandhoff,Svetlana Polyakova, Vladimir N Belov, Birka Hein, Claas von Middendorff, Andreas Sch¨onle, and Ste-fan W Hell. Direct observation of the nanoscale dynamics of membrane lipids in a living cell.
Nature ,457(7233):1159–62, Feb 2009.[3] Daniel Lingwood and Kai Simons. Lipid rafts as a membrane-organizing principle.
Science ,327(5961):46–50, Jan 2010.[4] A Kusumi and Y Sako. Cell surface organization by the membrane skeleton.
Current Opinion in CellBiology , 8(4):566–74, Aug 1996.[5] Akihiro Kusumi, Yuki M Shirai, Ikuko Koyama-Honda, Kenichi G N Suzuki, and Takahiro K Fuji-wara. Hierarchical organization of the plasma membrane: investigations by single-molecule tracking vs.fluorescence correlation spectroscopy.
FEBS Letters , 584(9):1814–23, May 2010.[6] Boris N Kholodenko, John F Hancock, and Walter Kolch. Signalling ballet in space and time.
Nat RevMol Cell Biol , 11(6):414–26, Jun 2010.[7] Kenichi G N Suzuki, Takahiro K Fujiwara, Fumiyuki Sanematsu, Ryota Iino, Michael Edidin, andAkihiro Kusumi. GPI-anchored receptor clusters transiently recruit Lyn and G alpha for temporarycluster immobilization and Lyn activation: single-molecule tracking study 1.
The Journal of Cell Biology ,177(4):717–30, May 2007.[8] Kenichi G N Suzuki, Takahiro K Fujiwara, Michael Edidin, and Akihiro Kusumi. Dynamic recruitmentof phospholipase C gamma at transiently immobilized GPI-anchored receptor clusters induces IP3-Ca2+signaling: single-molecule tracking study 2.
The Journal of Cell Biology , 177(4):731–42, May 2007.[9] Ian A Prior, Cornelia Muncke, Robert G Parton, and John F Hancock. Direct visualization of Rasproteins in spatially distinct cell surface microdomains.
The Journal of Cell Biology , 160(2):165–70,Jan 2003.[10] S. Plowman, C. Muncke, R. G. Parton, and J. F. Hancock. H-ras, K-ras, and inner plasma membraneraft proteins operate in nanoclusters with differential dependence on the actin cytoskeleton.
Proc. NatlAcad. Sci. USA , 102:15500–15005, 2005.[11] Ziya Kalay, Takahiro K Fujiwara, and Akihiro Kusumi. Confining domains lead to reaction bursts:reaction kinetics in the plasma membrane.
PLoS ONE , 7(3):e32948, Jan 2012.[12] Andrew Mugler, Aimee Gotway Bailey, Koichi Takahashi, and Pieter Rein ten Wolde. Membraneclustering and the role of rebinding in biochemical signaling.
Biophysical Journal , 102(5):1069–78, Mar2012.[13] T. Tian, A. Harding, K. Inder, S. Plowman, R. G. Parton, and J. F. Hancock. Plasma membranenanoswitches generate high-fidelity Ras signal transduction.
Nat. Cell Biol. , 9:905–914, 2007.[14] Thomas Gurry, Ozan Kahramano˘gullari, and Robert G Endres. Biophysical mechanism for Ras-nanocluster formation and signaling in plasma membrane.
PLoS ONE , 4(7):e6148, Jan 2009.[15] A M Walczak, A Mugler, and C H Wiggins. A stochastic spectral analysis of transcriptional regulatorycascades.
Proc Natl Acad Sci USA , 106:6529–6534, 2009.[16] A Mugler, A M Walczak, and C H Wiggins. Spectral solutions to stochastic models of gene expressionwith bursts and regulation.
Phys Rev E , 80:041921, 2009.3017] J. L. W. V. Jensen. Sur les fonctions convexes et les in´egalit´es entre les valeurs moyennes.
ActaMathematica , 30(1):175–193, 1906.[18] C. E. Shannon. A Mathematical Theory of Communication.
Bell Syst Tech J , 27:379–423, 623–656,1948.[19] Kotono Murase, Takahiro Fujiwara, Yasuhiro Umemura, Kenichi Suzuki, Ryota Iino, Hidetoshi Ya-mashita, Mihoko Saito, Hideji Murakoshi, Ken Ritchie, and Akihiro Kusumi. Ultrafine membranecompartments for molecular diffusion as revealed by single molecule techniques.
Biophysical Journal ,86(6):4075–93, Jun 2004.[20] Boryana N Manz, Bryan L Jackson, Rebecca S Petit, Michael L Dustin, and Jay Groves. T-cell triggeringthresholds are modulated by the number of antigen within individual T-cell receptor clusters.
Proc NatlAcad Sci USA , 108(22):9089–94, May 2011.[21] Thorsten Erdmann, Martin Howard, and Pieter Rein ten Wolde. Role of spatial averaging in theprecision of gene expression patterns.
Phys. Rev. Lett. , 103:258101, Dec 2009.[22] Raymond Cheong, Alex Rhee, Chiaochun Joanne Wang, Ilya Nemenman, and Andre Levchenko. Infor-mation transduction capacity of noisy biochemical signaling networks.
Science , 334(6054):354–8, Oct2011.[23] Gasper Tkacik, Curtis G Callan, and William Bialek. Information flow and optimization in transcrip-tional regulation.
Proceedings of the National Academy of Sciences , 105(34):12265–70, Aug 2008.[24] Gerardo Aquino, Diana Clausznitzer, Sylvain Tollis, and Robert G Endres. Optimal receptor-clustersize determined by intrinsic and extrinsic noise.
Physical Review E , 83:21914, Feb 2011.[25] Monica Skoge, Yigal Meir, and Ned S Wingreen. Dynamics of cooperativity in chemical sensing amongcell-surface receptors.
Physical Review Letters , 107:178101, Oct 2011.[26] Koichi Takahashi, Sorin Tanase-Nicola, and Pieter Rein ten Wolde. Spatio-temporal correlations candrastically change the response of a MAPK pathway.
Proc Natl Acad Sci USA , 107(6):2473–8, Feb2010.[27] Jeroen S van Zon, David K Lubensky, Pim R H Altena, and Pieter Rein ten Wolde. An allosteric modelof circadian KaiC phosphorylation.
Proc Natl Acad Sci USA , 104(18):7420–5, May 2007.[28] Paul Miller, Anatol M Zhabotinsky, John E Lisman, and Xiao-Jing Wang. The stability of a stochasticCaMKII switch: dependence on the number of enzyme molecules and protein turnover.
Plos Biol ,3(4):e107, Apr 2005.[29] P Nash, X Tang, S Orlicky, Q Chen, F B Gertler, M D Mendenhall, F Sicheri, T Pawson, and M Tyers.Multisite phosphorylation of a CDK inhibitor sets a threshold for the onset of DNA replication.
Nature ,414(6863):514–21, Nov 2001.[30] A. Zeke, M. Luk´acs, W. A. Lim, and A. Rem´enyi. Scaffolds: interaction platforms for cellular signallingcircuits.
Trends Cell Biol. , 19:364–374, 2009.[31] A Mugler, A M Walczak, and C H Wiggins. Information-optimal transcriptional response to oscillatorydriving.
Phys Rev Lett , 105:058101, 2010.[32] A M Walczak, A Mugler, and C H Wiggins. Analytic methods for modeling stochastic regulatorynetworks. In X Liu and M Betterton, editors,
Methods in Molecular Biology, Vol. 880: ComputationalModeling of Signaling Networks . Humana Press, 2012.[33] N G van Kampen.
Stochastic processes in physics and chemistry . Elsevier Science, 2nd edition, 1992.[34] S J Farlow.
Partial differential equations for scientists and engineers . Dover Publications, 1993.[35] J S Townsend.
A modern approach to quantum mechanics . University Science Books, 2000.3136] D C Mattis and M L Glasser. The uses of quantum field theory in diffusion-limited reactions.
Rev ModPhys , 70:979–1001, 1998.[37] M A Lemmon and J Schlessinger. Cell signaling by receptor tyrosine kinases.
Cell , 141:1117–34, 2010.[38] S Schlee, P Carmillo, and A Whitty. Quantitative analysis of the activation mechanism of the multi-component growth-factor receptor Ret.