[PDF] Spatial partitioning improves the reliability of biochemical signaling

Abstract

Spatial heterogeneity is a hallmark of living systems, even at the molecular scale in individual cells. A key example is the partitioning of membrane-bound proteins via lipid domain formation or cytoskeleton-induced corralling. Yet the impact of this spatial heterogeneity on biochemical signaling processes is poorly understood. Here we demonstrate that partitioning improves the reliability of biochemical signaling. We exactly solve a stochastic model describing a ubiquitous motif in membrane signaling. The solution reveals that partitioning improves signaling reliability via two effects: it moderates the non-linearity of the switching response, and it reduces noise in the response by suppressing correlations between molecules. An optimal partition size arises from a trade-off between minimizing the number of proteins per partition to improve signaling reliability and ensuring sufficient proteins per partition to maintain signal propagation. The predicted optimal partition size agrees quantitatively with experimentally observed systems. These results persist in spatial simulations with explicit diffusion barriers. Our findings suggest that molecular partitioning is not merely a consequence of the complexity of cellular substructures, but also plays an important functional role in cell signaling.

Full PDF

SSpatial partitioning improves the reliability of biochemical signaling

Andrew Mugler , Filipe Tostevin , and Pieter Rein ten Wolde FOM Institute AMOLF, Science Park 104, 1098 XG Amsterdam, The Netherlands

Abstract

Spatial heterogeneity is a hallmark of living systems, even at the molecular scale in individual cells. Akey example is the partitioning of membrane-bound proteins via lipid domain formation or cytoskeleton-induced corralling. Yet the impact of this spatial heterogeneity on biochemical signaling processes ispoorly understood. Here we demonstrate that partitioning improves the reliability of biochemical sig-naling. We exactly solve a stochastic model describing a ubiquitous motif in membrane signaling. Thesolution reveals that partitioning improves signaling reliability via two eﬀects: it moderates the non-linearity of the switching response, and it reduces noise in the response by suppressing correlationsbetween molecules. An optimal partition size arises from a trade-oﬀ between minimizing the numberof proteins per partition to improve signaling reliability and ensuring suﬃcient proteins per partition tomaintain signal propagation. The predicted optimal partition size agrees quantitatively with experimen-tally observed systems. These results persist in spatial simulations with explicit diﬀusion barriers. Ourﬁndings suggest that molecular partitioning is not merely a consequence of the complexity of cellularsubstructures, but also plays an important functional role in cell signaling.

The cell membrane is a nexus of information processing. Once regarded as a simple barrier between acell and its surroundings, it is now clear that the membrane is a hotspot of molecular activity, wheresignals are integrated and modulated even before being relayed to the inside of the cell [1]. Moreover,the membrane itself is structurally complex. Regions enriched in glycosphingolipids, cholesterol, and othermembrane components, often called lipid rafts, transiently assemble and ﬂoat within the surrounding bilayer[2], providing platforms for molecular interaction [3]. Additionally, interaction of the membrane with theunderlying actin cytoskeleton forms compartments in which molecules are transiently trapped [4, 5]. Thesemembrane sub-domains create a highly heterogeneous environment in which molecules are far from wellmixed, and it is currently unclear what eﬀect this heterogeneity has on cell signaling.Membrane sub-domains are thought to play a dominant role in the observed aggregation of signalingmolecules into clusters [6]. Interestingly, these clusters have a characteristic size of only a few molecules.For example, the GPI-anchored receptor CD59 is observed to form clusters of three to nine molecules uponinteraction with the cytoskeleton and lipid rafts [7, 8]. Similarly, the well-studied membrane-bound GTPaseRas forms clusters of six to eight molecules which also depend on interactions with the cytoskeleton andrafts [9, 10]. Despite the important ﬁndings that aggregation of proteins induced by sub-domains can aﬀectreaction kinetics [11], enhance oligomerization [1], modulate downstream responses [12, 13] and enhance sig-nal ﬁdelity [13,14], the origin of this characteristic size remains unknown. While it is quite possible that thesedomains owe their size to a thermodynamic or structural origin, we here address the question of whetherthis size can be optimized for signaling performance. We ﬁnd that the partitioning imposed by sub-domainsgives rise to a trade-oﬀ in cell signaling, from which an optimal size of a few molecules emerges naturally,suggesting that reliable signaling is intimately tied to the spatial structure of the membrane.We study via stochastic analysis and spatial simulation a model that is directly motivated by both CD59and Ras signaling at the membrane. Stimulated CD59 receptors induce the switching of several Src-family1 a r X i v : . [ q - b i o . M N ] F e b utside the cellcytoplasm A BC well mixed partitionedvs X X ∗ α Y Y ∗ γβ m γ Figure 1: Schematic depiction of the model system. A We consider a model representative of signal detectionby receptors and signal transmission at the cell membrane. B The model consists of two molecular species( X and Y ) which can each exist in active ( X ∗ , Y ∗ ) or inactive ( X , Y ) states. Molecules in the X state areactivated by the external signal of strength α , and active X ∗ molecules subsequently activate Y molecules. C We consider these reactions taking place in a single domain with all components well mixed, or in a domainconsisting of smaller compartments which are each individually well mixed but between which no interactionis possible. The total system volumes in the two scenarios are equal and assumed to scale with the numberof X molecules.kinases from an unphosphorylated to a phosphorylated state [7, 8]. Similarly, stimulated EGF receptorsinduce the switching of Ras proteins from an inactive GDP-loaded state to an active GTP-loaded state [13].We therefore study the simple and ubiquitous motif of coupled switching reactions, in which the activationof one species (the receptor) triggers the activation of a second species (the downstream eﬀector).We exactly solve this stochastic model of coupled switching reactions, and we use the solution to comparesignaling reliability in a spatially-partitioned system to that in a well-mixed system. We demonstrate thatpartitioning can improve signaling performance by generating a more graded input-output relation and byreducing the noise in the signaling response. This latter eﬀect comes about because partitioning reduces thecorrelations between the states of the diﬀerent output molecules. On the other hand, the stochastic exchangeof proteins between partitions can generate conﬁgurations which isolate molecules and exclude them fromthe signaling process, thereby reducing the dynamic range of the response and increasing the output noise.The trade-oﬀ between these two eﬀects results in an optimal partition size that agrees well with cluster sizesof signaling proteins that are observed experimentally [7–10], suggesting that cluster sizes are tuned so as tomaximize information transmission. We model two coupled molecular species at the membrane, as depicted in Fig. 1A. A membrane-boundreceptor (e.g. CD59 or EGF receptor) is activated via ligand stimulation, and the active receptor in turnactivates a membrane-bound eﬀector (e.g. a Src-family kinase or Ras). A reaction scheme representing theseprocesses is shown in Fig. 1B, and consists of two protein species: the receptor X and the downstream eﬀector Y . The switching of X molecules from the X to the X ∗ state is driven by an external signal of strength α .Active X ∗ molecules act on inactive Y molecules and promote switching to the Y ∗ state. Deactivation ofboth active protein species occurs spontaneously and independently.We will be concerned with how the network response, the number of active Y ∗ molecules as a function ofthe input signal α , is aﬀected by the spatial structure of the system. In particular we ask how partitioning2f the reaction system into non-interacting sub-domains aﬀects the reliability of signal transmission, whichis determined by two principal factors: the input-output response and the output noise; together theseproperties determine to what extent diﬀerent input signals can be reliably resolved from the network response.We focus on two system conﬁgurations, shown in Fig. 1C. In the ﬁrst case we assume that all molecules arepresent in a single well-mixed reaction compartment. In the second case, we consider a system partitionedinto π compartments between which no interactions are permitted; here we take the output of the systemto be the total number of active Y ∗ molecules in all compartments. This choice of output corresponds to areadout of the Y ∗ signal by, e.g., a cytosolic component whose diﬀusion is much faster than the diﬀusion andsignaling of X and Y on the membrane. In the partitioned system, we will for simplicity ﬁrst assume thatthe molecules are uniformly and statically distributed among compartments. However, recognizing that thisscenario will not generally be realized inside cells, we will later relax this assumption and consider exchangeof molecules among partitions.We model the dynamics of the well-mixed system, as well as each compartment within the partitioned system,using a stochastic equation of the same form. We denote the total numbers of X and Y molecules by M and N , respectively, and the numbers of active X ∗ and Y ∗ molecules by m and n , respectively. To parameterizethe system, we scale units of time by the deactivation rate of X ∗ , such that the eﬀective deactivation rate is1. Then α denotes the rescaled activation rate of X ; γ is the rate of deactivation of Y ∗ relative to that of X ∗ ;and γβ m is the activation rate of a given Y molecule for a particular concentration of X ∗ molecules. Theparameter α incorporates the eﬀective strength of the input signal and determines the mean X ∗ activity viathe occupancy q ≡ (cid:104) m (cid:105) /M = α/ ( α + 1). The precise m -dependence of the coupling function β m will dependon the exact nature of the interactions between X ∗ and Y molecules. We take β m ∝ m/v , with v the volumeof the compartment in which the reactions are taking place. However, our conclusions are unaﬀected if weinstead take a Michaelis-Menten form β m ∝ m/ ( m + vK ) (Appendix C: Fig. 7). The total system volume V is assumed to scale with the total number of X molecules, such that M/V is constant. The coupling functionin partition i ∈ { , . . . , π } is then determined by m i , the number of X ∗ molecules in partition i , accordingto β ( i ) m ∝ m i / ( V /π ) = βπm i /M for constant β .The probability of having m proteins in the X ∗ state and n proteins in the Y ∗ state evolves according tothe chemical master equation (CME),˙ p mn = − [ L m ( α, M ) + γ L n ( β m , N )] p mn , (1)subject to suitable boundary conditions. The nature of the particular set of reactions in our model (Fig. 1B)means that the operators L m and L n have the same form, L m ( α, M ) = α (cid:2) − E − m (cid:3) ( M − m ) + (cid:2) − E +1 m (cid:3) m, (2)where E im f ( m ) = f ( m + i ) deﬁnes the step operator. Despite the appearance of terms containing theproduct mn in the operator L n ( β m , N ), which make the direct calculation of moments of p mn from theCME impossible, an exact solution to (1) can be found for arbitrary β m using the method of spectralexpansion [15, 16] as described in Appendix A.1. We begin by analyzing the behavior of a minimal system with M = N = 2. In the well-mixed system, allmolecules are contained within π = 1 domain of volume V . In the partitioned system, π = 2 subdomainswith volume V / X and one Y molecule.We ﬁrst focus on the mean response (cid:104) n (cid:105) . In the limits of small or large α the mean response is the same inboth the partitioned and mixed systems, (cid:104) n (cid:105) /N → (cid:104) n (cid:105) /N → β/ ( β + 1) respectively. However, at allintermediate values of α , the mean response of the well-mixed system is larger than that of the partitioned3 ell mixed perfectly partitioned partitioned with exchange output mean, /M ou t pu t no i s e , (cid:109) m / M well (cid:239) mixedpartitionedpartitioned with exchange output mean, /M ou t pu t no i s e , (cid:109) m / M well (cid:239) mixedperfectly partitionedpartitioned with exchange0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 100.050.10.150.20.250.30.350.4 output mean, /M ou t pu t no i s e , (cid:109) m / M well (cid:239) mixedperfectly partitionedpartitioned with exchange I = 0 .

332 bits I = 0 .

463 bits I = 0 .

213 bits mean input, q = /M = (cid:95) /( (cid:95) +1) m ean ou t pu t, < n > / N A mean output, /N no i s e i n ou t pu t, (cid:109) n2 / N B Figure 2: Spatial partitioning improves signaling performance. A The mean response (cid:104) n (cid:105) /N as a functionof the mean X ∗ activity q = (cid:104) m (cid:105) /M = α/ ( α + 1), and B the output variance σ n as a function of the meanresponse, plotted for a well-mixed system with M = N = 2 (thick solid) and a partitioned system of π = 2compartments, each containing one X and one Y molecule (thick dashed). Partitioning linearizes the outputresponse and reduces noise across the full range of responses, leading to a higher transmitted information.The thin solid curves show the mean ﬁeld response (cid:104) n (cid:105) /N = βq/ ( βq + 1) in A and the binomial noiselimit (3) in B. Allowing exchange of molecules between compartments (thick dot-dashed) compresses theoutput response and increases the noise compared to the perfectly partitioned system, dramatically reducinginformation transmission. Here β = 20 and γ = 1.system; equivalently, the partitioned system exhibits a more graded response than the well-mixed systemto changes in the input signal (see Fig. 2A, thick solid and dashed curves). The more graded response isdue to higher ﬂuctuations in X ∗ activity. When α → α → ∞ , all X molecules are inactive or active,respectively; however at intermediate values of α , the number of active X ∗ molecules ﬂuctuates. Partitioningreduces the number of X molecules per reaction compartment, increasing the relative size of these ﬂuctuationsaccording to σ m / ( M/π ) = πq (1 − q ) /M . These ﬂuctuations are passed through the concave dependenceof n on m , resulting in a smaller mean (via Jensen’s inequality [17]), and therefore a more linear responsecurve (see also Appendix C: Fig. 8A).A more graded input-output relation can potentially enhance signaling by expanding the range of inputsignals which the network is able to transmit without saturating the response. However, in order to determinewhether this larger input range can be resolved in the network it is crucial to examine how the noise in theresponse is aﬀected. Figure 2B shows the variance of the output σ n as a function of the mean response (cid:104) n (cid:105) for the systemwith M = N = 2, as the input signal strength α is varied. We see that the output noise is reducedin the partitioned system relative to the well-mixed system across the full range of response levels. Thenoise reduction is surprising: one might expect that the increased ﬂuctuations in X ∗ activity that come withpartitioning would propagate to ﬂuctuations in Y ∗ activity. Indeed, this is the case: in a single compartment,as the number of X molecules is reduced, the noise in the output increases (Appendix C: Fig. 8B). However,4 perfectly partitioned B well mixed m m n n m n n σ n N = n N − n N + ∆ Nσ n N = n N − n N Figure 3: Partitioning reduces correlations between output modules. A In the partitioned system, each Y molecule receives an independent signal m i ( t ). The variance is simply that of independent two-state switches. B In the well-mixed system, each Y molecule reacts to the same m ( t ), which leads to correlations betweenin the states of diﬀerent Y molecules and an increase in the variance σ n . Sample trajectories are generatedusing parameters as in Fig. 2, with α = 1.this eﬀect is overcome by a second eﬀect: partitioning reduces correlations among output molecules.To see the eﬀect of partitioning on correlations, we consider the expressions for the variance. In the parti-tioned case, since the two Y molecules switch independently, the variance of n is simply that of a pair ofindependent binomial switches with activation probability (cid:104) n (cid:105) /N , σ n N = (cid:104) n (cid:105) N (cid:18) − (cid:104) n (cid:105) N (cid:19) . (3)In contrast, in the well-mixed case the two Y molecules are not independent. Since both are driven by thesame set of X molecules, ﬂuctuations in β m lead to correlations between the states of the two Y moleculesas their switching becomes more synchronized (see Fig. 3). This in turn leads to an increase in the variance,which can be written as σ n N = (cid:104) n (cid:105) N (cid:18) − (cid:104) n (cid:105) N (cid:19) + ∆ N , (4)where ∆ is a correction term accounting for the correlation between Y molecules, which are due to “extrinsic”ﬂuctuations in the input m ( t ). The functional form of ∆ for any M and N follows directly from the spectralsolution of the CME (Appendix B: Eqn. 74); for M = N = 2 one ﬁnds by inspection that ∆ is manifestlypositive, meaning that correlations increase the noise across all values of the mean. Importantly, this eﬀectis independent of the parameters of the switching reactions.The reduction of noise upon partitioning extends beyond the case of one Y molecule per partition. Indeed thesame phenomenon is observed if we consider larger molecule numbers M > π and

N > π , and compare thewell-mixed system to a system with uniform partitioning of the X and Y molecules into the π compartments.In the well-mixed case all Y molecules respond to the same signal m ( t ), and hence are correlated with allother Y molecules in the system. By contrast, in the partitioned case the N/π > Y molecules within eachpartition are correlated, and indeed since the ﬂuctuations in m i ( t ) will be larger than m ( t ) for the mixedsystem, such correlations will be stronger; yet, the Y molecules in diﬀerent partitions are uncorrelated. Thislatter eﬀect is suﬃcient to overcome the increase in correlations within each partition, such that the totalnoise is reduced. 5o see the noise reduction explicitly, we again consider the expression for the variance. Since the dynamicsof diﬀerent partitions is independent, assuming that both M and N are multiples of π , the variance can bewritten as σ n N = (cid:104) n (cid:105) N (cid:18) − (cid:104) n (cid:105) N (cid:19) + π ∆( ˜ M , ˜ N ) N , (5)where ˜ M ≡ M/π and ˜ N ≡ N/π are the numbers of X and Y molecules per compartment, respectively.Here, as before, ∆( ˜ M , ˜ N ) represents the additional ﬂuctuations due to correlations between the states of Y molecules within each compartment. The N -dependence of ∆( ˜ M , ˜ N ), which reﬂects the number of correlatedpairs of Y molecules, can be straightforwardly factored out as ∆( ˜ M , ˜ N ) = ˜ N ( ˜ N −

1) ˜∆( ˜ M ), where ˜∆( ˜ M )describes how strongly correlated are Y molecules within each compartment. The exact form for ˜∆( ˜ M ),while straightforward to calculate for a given ˜ M , is diﬃcult to generalize for all ˜ M ; nonetheless, inspectionof numerical and analytic results for speciﬁc combinations of ˜ M and ˜ N reveals in all cases that increasing π leads to an overall reduction in σ n . Additionally, if the switching of Y molecules is much slower than thatof X molecules, γ (cid:28)

1, then ˜∆( ˜ M ) takes the form˜∆( ˜ M ) ≈ αβ γ ˜ M (1 + α + αβ ) (6)Inserting this expression into (5) with ˜ M = M/π and ˜ N = N/π , one can straightforwardly see that thevariance is a decreasing function of π for π < N , indicating that the noise is reduced as the system is moreﬁnely partitioned. We have seen that partitioning has two beneﬁcial eﬀects on signal propagation: the input-output responsebecomes more graded, and the output noise at a given response level is reduced. Together, these eﬀectsmean that a larger number of distinct input signals can be encoded in the network response. To quantifythe ability of the network to transmit signals we calculate the mutual information I [ α, m ] [18] between theinput and the number of active Y ∗ molecules, as described in Appendix A.2. We ﬁnd that indeed, in thecase of M = N = 2 (Fig. 2), I [ α, m ] is signiﬁcantly larger for the partitioned system ( I = 0 .

463 bits) thanfor the well-mixed system ( I = 0 .

332 bits), conﬁrming that signal transmission is dramatically improved bypartitioning.

Thus far we have considered only the perfectly uniform and stationary partitioning of molecules. In reality,physical transport processes such as diﬀusion will also give rise to a variety of conﬁgurations with diﬀerentnumbers of proteins in each compartment, as depicted in Fig. 4. Each of these conﬁgurations will havediﬀerent properties for the transmission of the signal from α to n . It is therefore important to considerwhether the beneﬁts of partitioning described above persist once these additional conﬁgurations are takeninto account.Single-molecule tracking experiments have revealed that the timescale of diﬀusive mixing within a com-partment ( ∼ µ s) is two orders of magnitude faster than the timescale of molecular exchange betweencompartments ( ∼

10 ms) [19]. This observation allows us to treat each conﬁguration as static on the timescaleof mixing, then compute the total response by averaging over all conﬁgurations. Inherent in this treatmentis the assumption that the timescale of signaling is also faster than that of exchange between compartments.We later relax this assumption using spatially resolved simulations and nonetheless ﬁnd similar results.6 information (bits) ! ! ! ! ! Figure 4: Exchange between partitions leads to diﬀerent conﬁgurations of the system with a range of signalingperformance. Multiplicities listed above each conﬁguration are due to symmetry. Parameters are as in Fig.2.The total response is computed by ﬁrst enumerating the possible conﬁgurations of M X molecules and N Y molecules distributed amongst π partitions. For each such conﬁguration c we then solve for theoutput distribution p n | c and combine these distributions, weighted by the probability p c of each conﬁgurationoccurring if molecules are randomly assigned to diﬀerent partitions with uniform and independent probability,to give the overall response distribution p n = (cid:80) c p n | c p c .Figure 2B (dot-dashed curve) shows that the exchange of molecules between compartments increases thenoise relative to the perfectly partitioned system considered previously when M = N = 2. This is becausemany of the alternative conﬁgurations generated by exchange lead to signiﬁcant correlations between thestates of the diﬀerent Y molecules. Nevertheless, we see that the noise remains lower than that of the well-mixed system, because of the existence of some conﬁgurations in which the Y molecules are independent.However, the appearance of alternate conﬁgurations also aﬀects the mean response (Fig. 2A); in particular,the appearance of conﬁgurations in which X and Y molecules do not occupy the same partitions, and henceno signal can be propagated, means that the maximal output level is reduced. Given this simultaneous changein both the input-output function and the noise, it is not immediately clear whether signaling reliability isimproved relative to the well-mixed system. Computing the mutual information, we see that the informationtransmitted by the system with exchange ( I = 0 .

213 bits) is signiﬁcantly lower than that for the well-mixedsystem ( I = 0 .

332 bits), showing that the reduction of the output range compromises signal transmission toan extent which cannot be overcome by the corresponding reduction in noise.The decrease in information transmission upon incorporating molecule exchange in the system with M = N = 2 is the result of the appearance of suboptimal protein conﬁgurations, for which signal propagationis compromised (or even impossible). However, the number and performance of such conﬁgurations will ingeneral depend on the relative values of M , N and π (which need not equal M or N ). While moleculeexchange may make partitioning unfavorable in the extreme case of M = N = 2, for systems with higherprotein numbers it can be beneﬁcial to partition the system into π > To study the performance of systems with higher protein numbers and diﬀerent partition sizes, we comparethe information transmission, including molecule exchange, for diﬀerent partition numbers π as the numberof proteins in the system is varied while holding M = N . Figure 5A shows that for M = N > π > M = N isincreased the optimal partition number also increases such that the optimal number of proteins per partition M/π ∗ = N/π ∗ ≈ β and γ : changing7 particle number, M = N i n f o r m a t i on ( b i t s ) A (cid:47) = 1 (cid:47) = 2 (cid:47) = 3 (cid:47) = 4 0 1 2 3 4 5 6 7 8 9 10012345 particle number, M = N op t i m a l pa r t i t i on nu m be r , (cid:47) * M/ (cid:47) * = N/ (cid:47) * (cid:53) B Figure 5: An optimal partition size. A For M = N > π > π = 1). B As M = N is increased the optimalpartition number also increases such that the optimal number of proteins per partition M/π ∗ = N/π ∗ ≈ M/π ∗ = N/π ∗ ∼ − M = N is also not crucial for this result. In fact, we ﬁndthat the value of M/π ∗ has only a weak dependence on N (Appendix C: Fig. 10).The optimal partition size arises from a trade-oﬀ between the reliability and eﬃciency of signaling. Increasingthe number of partitions decreases the typical number of proteins per partition, which leads to the beneﬁcialeﬀects of a more graded response and reduced noise, increasing signaling reliability. On the other hand, dueto molecule exchange, reducing the number of molecules per partition also increases the probability that anypartition contains proteins of only one species that are therefore excluded from the signaling process, whichleads to a reduced maximal response, reducing signaling eﬃciency.The optimal size revealed by our study of ∼ −

10 molecules per species per partition shows good quantitativeagreement with the observed aggregation of CD59 receptors (3 − − Lastly, we conﬁrm that the eﬀects observed in these minimal model systems, where the contents of eachcompartment are well-mixed and exchange can occur between any pair of compartments, persist in a morerealistic model in which the diﬀusion of molecules in space is included explicitly. We simulate the diﬀusionand reaction of X and Y molecules on a two-dimensional lattice, as described in Appendix A.3. The systemis partitioned into a number of subdomains by the introduction of diﬀusion barriers, which are crossed witha reduced probability p hop relative to regular diﬀusion steps on the lattice. Results of such simulations areshown in Fig. 6.Figure 6A and B reveal that as the strength of the diﬀusion barriers is increased, the mean response becomesmore graded, and the variance of Y ∗ activity is reduced, analogous to the two eﬀects observed in theminimal model system (Fig. 2). When p hop = 0, one molecule of each species is permanently conﬁned to a8ompartment, producing the graded response predicted for the perfectly partitioned system (Fig. 6A) andthe associated minimal, binomial noise (Fig. 6B). Low but ﬁnite p hop allows exchange of molecules betweenneighboring compartments but preserves a separation of timescales between intra- and inter-compartmentmixing. This results in a graded mean response whose maximal level is reduced (Fig. 6A) and reducednoise (Fig. 6B), precisely the features observed in the minimal model of partitioning with exchange (Fig. 2).When p hop = 1, there are no barriers, and the system approaches the well-mixed limit (CME). Interestingly,however, the response remains more graded and the noise remains lower than the predictions of the CMEdue to the ﬁnite speed of diﬀusion (Fig. 6A and B), with agreement only reached when the ratio of diﬀusionto reaction propensities is much greater than one. This observation reveals that ﬁnite diﬀusion imposesan eﬀective partitioning even when no actual partitions exist: molecules remain correlated with reactionpartners within a typical distance set by diﬀusion, but uncorrelated with partners beyond this distance.As such, in the context of coupled reversible modiﬁcation, we ﬁnd that slower diﬀusion can linearize theresponse and reduce the noise, thereby improving information transmission. It is important to emphasize,however, that the extent of this eﬀect is much smaller than for actual partitioning: Fig. 6B shows that ﬁnitediﬀusion reduces the maximal noise by (1 . − / .

25 = 20%, while strong partitioning ( p hop = 0 . . − . / . ≈ β and γ , spanning the rangeof ∼ −

10 molecules per partition (Appendix C: Fig. 9C and D). Fig. 6C also provides a measure of thescale of information transmitted by this motif. In absolute terms, the optimal information (1 .

35 bits) isconsistent with values recently measured for signaling via the TNF-NF- κ B pathway ( ∼ . − . Drosophila embryo (1 . ± .

15 bits) [23]. In relative terms, we see that partitioningincreases information over the unpartitioned system by (1 . − . / . ≈

30% (Fig. 6C) and decreasesthe maximal noise by (1 − . / We have seen that the partitioning of a biochemical signaling system into a number of non-interacting subsys-tems improves the reliability of signaling via two eﬀects. First, the non-linear response of the network meansthat a reduction in the number of input molecules translates into a more graded input-output response.Second, partitioning signiﬁcantly reduces the noise in the response by eliminating correlations between thestates of the diﬀerent output molecules, an eﬀect which, remarkably, overcomes the increase in noise associ-ated with fewer input molecules in each subsystem. On the other hand, we have seen that the introductionof diﬀusion or exchange of molecules between partitions enhances the variance and reduces the range of theresponse, thereby reducing signaling performance. This result is due to the presence of conﬁgurations in Interestingly, this result is in marked contrast to the case of boundary establishment in embryonic development, wherefaster diﬀusion reduces noise within each nucleus by washing out bursts of gene expression in the input signal [21]. Whilein the present system faster diﬀusion will similarly reduce any super-Poissonian component of the noise within each partitionindividually, this averaging does not reduce the noise in the total output across all partitions. In fact, the latter noise isenhanced with faster diﬀusion by virtue of increased correlations between partitions. mean input, q = /M = (cid:95) /( (cid:95) +1) m ean ou t pu t, < n > / N A mean output, /N no i s e i n ou t pu t, (cid:109) n2 / N B (cid:239) partition size, M/ (cid:47) = N/ (cid:47) i n f o r m a t i on ( b i t s ) C well mixed (CME)p hop = 1p hop = 0.1p hop = 0.01p hop = 0.001p hop = 0perfectly partitioned (CME) Figure 6: The eﬀects of partitioning persist in simulations with explicit diﬀusion. As the probability ofcrossing a diﬀusion barrier p hop is decreased, A the mean response becomes more graded, and B the outputnoise decreases. C The information transmission has a maximum as a function of the partition size. Here M = N = 49, β = 20, γ = 1, the system is λ = 70 lattice spacings squared, and the ratio of diﬀusion toreaction propensities is p D /p r = 1. In A, π = 49; in B, p hop = 0 . √ π from 25 to 1.which the two species are isolated from one another, compromising or even arresting signal transmission incertain partitions. The interplay between these two eﬀects leads to a partition size that optimizes informa-tion transmission, corresponding to a few molecules per partition on average, in quantitative agreement withexperiments. These eﬀects are generic, and hence the emergence of an optimal partition size is robust to thespeciﬁc parameters of the model. Notably, the underlying mechanism revealed here, namely the removal ofcorrelations, diﬀers fundamentally from that based on cooperativity in protein activation, which has beenargued to underlie optimal cluster size in sensory systems [24, 25].Reversible modiﬁcation reactions are ubiquitous in cell signaling, and interactions with the cytoskeleton andlipids provide general mechanisms for the formation of subdomains. We therefore expect the results revealedby our study to be applicable to a wide class of signaling systems at the membrane. We have focused in thispaper on coupled single-site modiﬁcation reactions because this motif governs pathways speciﬁcally knownto be aﬀected by the formation of membrane sub-domains. However, the eﬀects we uncover also pertainto multi-site modiﬁcation reactions, which are very common in cell signaling [26–29]. Moreover, we havefocused on systems where the reactant species are conﬁned by a boundary which limits diﬀusion. However,similar eﬀects could be observed in systems where proteins are localized to raft domains, or even scaﬀolds orlarge macromolecular complexes. In the latter case, each complex would eﬀectively provide an independentreaction “compartment,” and the exchange between compartments would be the result of rare dissociationevents, after which proteins could diﬀuse rapidly through the cytoplasm to a diﬀerent complex. Even if thesignal within each complex was not mediated via diﬀusive encounters, but rather via cooperative or allostericinteractions, the fundamental mechanism that we reveal here – that partitioning into subsystems removes10orrelations between subsystems – remains at play. The presence of scaﬀolds and macromolecular complexesat early stages of signaling pathways is extremely common [30], suggesting that the eﬀects discussed hereare of wide biological relevance. The CME (1) is solved using the method of spectral expansion [15,16]. Details of this method, the computa-tion of mutual information, and the spatial simulations are described in Appendix A. Source code, writtenin MATLAB, C++, and Mathematica, used to generate all results and ﬁgures in the main text and the

SIAppendix is freely available at http://partitioning.sourceforge.net . This work is part of the research program of the “Stichting voor Fundamenteel Onderzoek der Materie(FOM)”, which is ﬁnancially supported by the “Nederlandse organisatie voor Wetenschappelijk Onderzoek(NWO)”. We thank Philippe Nghe for a critical reading of the manuscript.

A Detailed methods

A.1 Spectral solution of the master equation

The chemical master equation (CME) is solved using the method of spectral expansion [15, 16], describedin detail in Appendix B. Brieﬂy, the structure of of the CME, in which the dynamics can be separatedinto two operators that act only on m or n but not both, allows for its solution to be written in the form p mn ( t ) = (cid:80) Mj =0 (cid:80) Nk =0 G jk ( t ; ¯ β ) φ jm ( α ) φ kn ( ¯ β ), where φ jm ( α ) is the j th eigenvector of the operator L m ( α ) andsimilarly for φ kn ( ¯ β ), and ¯ β is an expansion parameter on which p mn does not ultimately depend. Theexpansion coeﬃcients G jk ( t ; ¯ β ) can be calculated straightforwardly, as shown in Appendix B. Importantly,this spectral expansion dramatically decreases the computational complexity of calculating p mn : rather thansolving the ( M + 1)( N + 1) × ( M + 1)( N + 1) system of the original CME, it is only necessary to solve N linear systems of size ( M + 1) × ( M + 1) for the vectors of coeﬃcients (cid:126)G k . We emphasize that sincethe system has a ﬁnite state-space, no approximations are made in using the spectral expansion, and thesolution remains exact. Furthermore, the moments of the steady-state distribution p mn can be convenientlyexpressed in terms of the expansion coeﬃcients G jk ; in particular, (cid:104) n (cid:105) = G and (cid:104) n (cid:105) = 2 G + G . A.2 Mutual information

The mutual information between network input and response is given by the standard expression [18] I [ α, m ] = (cid:104) log { p ( α, n ) / [ p ( α ) p ( n )] }(cid:105) , where the average is taken over the joint distribution p ( α, n ) = p ( n | α ) p ( α ),and p ( n | α ) = (cid:80) Mm =0 p ( m, n | α ) is given by the steady state of the CME. The calculation of the mutual in-formation requires speciﬁcation of the distribution of input signals p ( α ). We choose N α values of α suchthat q = α/ ( α + 1) = (cid:104) m (cid:105) /M is uniformly-spaced over the range 0 ≤ q ≤

1; then p ( n ) = (cid:80) N α i =1 p ( n | α i ) p ( α i )11nd p ( α i ) = N − α . However, our conclusions are unaﬀected if we instead take a input distribution that isunimodal or bimodal (Fig. 13). We take N α >

30, for which I [ α, m ] converges to within 1% of its large- N α limit (Fig. 14). A.3 Spatial simulations

The diﬀusion and reactions of M X molecules and N Y molecules are simulated on a two-dimensional squarelattice of side length λ using a ﬁxed-time-step integration scheme. During each step of duration δt , eachparticle is moved to a random neighboring lattice site with probability p D = ( D/(cid:96) ) δt , where D is thediﬀusion constant, and (cid:96) is the lattice spacing. Molecules have steric interactions on the lattice, such thatonly one molecule can be present at each lattice site at any time. Attempted moves to an occupied site arerejected, with the particle remaining at its original position. If a molecule in the X ∗ state is adjacent to amolecule in the Y state, the latter is converted to the Y ∗ state with probability p r = γ ( βλ /M ) δt . To make π partitions, linear diﬀusion barriers are placed at iλ/ √ π in each direction, where i ∈ { , , . . . , √ π − } . Adiﬀusion step which crosses such a barrier is accepted with probability reduced by a factor p hop . The timestep δt is chosen suﬃciently small that no probability exceeds one. B Solution of the master equation by spectral expansion

This section describes the solution via the method of spectral expansion, or the ‘spectral method’, of the CMEintroduced in the main text. The spectral method has been used fruitfully in the context of gene regulationto solve CMEs describing cascades [15], bursts [16], and oscillations [31], and a pedagogical treatment isavailable in [32]. Here we apply the spectral method to coupled reversible switching.From Eqns. 1-2 of the main text, the stochastic dynamics of the system under study are given by the CME˙ p mn = − [ L m ( α, M ) + γ L n ( β m , N )] p mn , (7)where both operators L m and L n have the form L m ( α, M ) = α (cid:2) − E − m (cid:3) ( M − m ) + (cid:2) − E +1 m (cid:3) m, (8)with E im f ( m ) = f ( m + i ) deﬁning the step operator. The CME describes the evolution of the probability ofhaving m X proteins in the active state and n Y proteins in the active state, with β m the coupling functionby which X drives the activation of Y . B.1 The moments do not close

We ﬁrst demonstrate that direct computation of the moments from the CME is not possible because themoments do not close. The reason is that a nonlinearity is present in the ﬁrst term of Eqn. 8 in the form ofthe factor β m n . As a result, the ﬁrst moment depends on a higher moment, which in turn depends on aneven higher moment, and so on.To see explicitly that the moments do not close, we consider computing the dynamics of the ﬁrst moment ofthe driven species, the mean (cid:104) n (cid:105) , by summing the CME over m and n against n . We obtain1 γ ∂ t (cid:104) n (cid:105) = −(cid:104) n (cid:105) + N (cid:104) β m (cid:105) − (cid:104) β m n (cid:105) , (9)12here averages are taken over p mn . We see that indeed the ﬁnal term carries the nonlinearity. Even for thesimplest coupling function, i.e. linear coupling β m = cm , one ﬁnds a hierarchy of moment dependencies thatdoes not close: ∂ t (cid:104) n (cid:105) = − γ (cid:104) n (cid:105) + γcN (cid:104) m (cid:105) − γc (cid:104) mn (cid:105) , (10) ∂ t (cid:104) mn (cid:105) = αM (cid:104) n (cid:105) − ( α + γ + 1) (cid:104) mn (cid:105) + γcN (cid:104) m (cid:105) − γc (cid:104) m n (cid:105) , (11) ∂ t (cid:104) m n (cid:105) = . . . (12)That is, the dynamics of (cid:104) n (cid:105) depend on (cid:104) mn (cid:105) , whose dynamics depend on (cid:104) m n (cid:105) , and so on.The fact that the moments cannot be computed—indeed, not even the mean output (cid:104) n (cid:105) —makes it particu-larly important to actually solve the CME in order to learn about the statistical properties of this system. B.2 The spectrum of the switch operator

The CME is a linear equation. Even when the rates are nonlinear functions of the molecule numbers, theCME is still linear in its degree of freedom, the joint probability. The most straightforward way to solve alinear equation is to write its solution as an expansion in the eigenfunctions of the linear operator. Although itis diﬃcult to derive the eigenfunctions of the coupled operator L m ( α, M ) + γ L n ( β m , N ), it is straightforwardto derive the eigenfunctions of the uncoupled operator L m ( α, M ), which we call the switch operator. Indeed,we will see that expanding the joint probability in eigenfunctions of the uncoupled operator greatly simpliﬁesthe form of the CME, yielding an exact solution in terms of matrix algebra.The switch operator governs the CME for the ﬁrst species X ; explicitly,˙ p m = −L p m = α [ M − ( m − p m − + ( m + 1) p m +1 − [ α ( M − m ) + m ] p m , (13)where for notational simplicity we have taken L m ( α, M ) → L . Its eigenvalue relation is written L φ jm = λ j φ jm , (14)for eigenvalues λ j and eigenvectors φ jm . B.2.1 Eigenvalues

The matrix form of the operator L can be read directly from Eqn. 13: L =  M α − − M α ( M − α + 1 − − ( M − α ( M − α + 2 −

3. . . . . . . . . − α α + ( M − − ( M − − α α + ( M − − M − α M  . (15)The tridiagonal structure follows from the fact that molecule numbers only increase or decrease by oneat a time. Practically speaking, the eigenvalues can be obtained using the fact that the determinant of atridiagonal matrix can be computed recursively. Performing the computation for M = 0 , , , . . . reveals thepattern λ j = ( α + 1) j, j ∈ { , , , . . . , M } . (16)13owever, Eqn. 16 can be derived more rigorously by making use of a generating function. We presentthis derivation next, since the generating function formalism will also prove quite useful in deriving theeigenvectors and solving the CME.The generating function is an expansion in any complete basis for which the probability distribution providesthe expansion coeﬃcients [33]. Choosing as our basis the set of polynomials in some continuous variable x ,the generating function is deﬁned G ( x ) = M (cid:88) m =0 p m x m . (17)The probability distribution is recovered via the inverse transform p m = 1 m ! ∂ mx [ G ( x )] x =0 . (18)A key utility of the generating function is turning the CME, which is a set of ordinary diﬀerential equations(ODEs), into a single partial diﬀerential equation. Indeed, summing Eqn. 13 against x m yields˙ G = − ( x − αx + 1) ∂ x − αM ] G, (19)where the appearances of x and ∂ x arise from the shifts m − m + 1, respectively. Eqn. 19 directly givesthe form of the operator in x space: L = ( x − αx + 1) ∂ x − αM ]. The eigenfunctions are then obtainedfrom the relation L φ j ( x ) = λ j φ j ( x ) by separating variables and integrating: φ j ( x ) = ( α + 1) − M ( x − λ j / ( α +1) ( αx + 1) M − λ j / ( α +1) . (20)Here, the constant factor ( α + 1) − M is determined by application of the normalization condition G (1) = 1to the steady state solution, which is obtained by setting λ j = 0: G ( x ) = (cid:18) αx + 1 α + 1 (cid:19) M . (21)We will solve Eqn. 19 in two ways: by the method of characteristics and by expansion in the eigenfunctions;together these solutions will reveal the eigenvalues.First, the method of characteristics [34] posits that the dependence of G on x and t occurs via some parametricvariable s , i.e. G ( x, t ) = G [ x ( s ) , t ( s )]. The chain rule then gives dG/ds = ( ∂G/∂x )( dx/ds ) + ( ∂G/∂t )( dt/ds ),which when compared term by term with Eqn. 19 yields three ordinary diﬀerential equations: dtds = 1 , dxds = ( x − αx + 1) , dGds = αM ( x − . (22)The ﬁrst identiﬁes s = t , with which the second is solved by z = x − αx + 1 e − ( α +1) t , (23)where z is a constant of integration. The crux of the method is that Eqn. 23 deﬁnes a characteristic curveon which G must depend, i.e. G ( x, t ) = f [ z ( x, t )] g ( x, t ), where f and g are unknown functions, and z hasbeen promoted to a characteristic function of x and t . The function g is identiﬁed by realizing that steadystate is reached as t → ∞ , for which f ( z ) → f (0) no longer depends on x or t . Therefore, g must be thesteady state function given in Eqn. 21: G ( x, t ) = (cid:18) αx + 1 α + 1 (cid:19) M f ( z ) . (24)Although we still do not know f , we may Taylor expand it around the point z = 0, yielding G ( x, t ) = (cid:18) αx + 1 α + 1 (cid:19) M ∞ (cid:88) j =0 c j z j = (cid:18) αx + 1 α + 1 (cid:19) M ∞ (cid:88) j =0 c j (cid:18) x − αx + 1 (cid:19) j e − ( α +1) jt , (25)14here c j ≡ ∂ jz [ f ( z )] z =0 /j !.Second, because Eqn. 19 is linear, we may also write down its solution as an expansion in the eigenfunctionsof its linear operator: G ( x, t ) = (cid:88) j C j ( t ) φ j ( x ) . (26)Under the assumption that the eigenfunctions are orthogonal (which will be shown in the next section),inserting Eqn. 26 into Eqn. 19 yields an independent ODE for each C j , ˙ C j = − λ j C j , which is solved by C j ( t ) = c j e − λ j t for initial conditions c j . Inserting this functional form and that for φ j ( t ) (Eqn. 20) into Eqn.26 yields G ( x, t ) = (cid:18) αx + 1 α + 1 (cid:19) M (cid:88) j c j (cid:18) x − αx + 1 (cid:19) λ j / ( α +1) e − λ j t . (27)Comparison of Eqns. 25 and 27 reveals both the expression for the eigenvalues, λ j = ( α + 1) j , and a limiton their domain, the nonnegative integers j ∈ { , , , . . . , ∞} . Of course, the domain can be a subset ofthe nonnegative integers; then some c j in Eqn. 27 would be zero. Indeed, since L is a ﬁnite matrix of size M + 1 by M + 1 (Eqn. 15), it is spanned by M + 1 linearly independent eigenvectors, meaning we expect only M + 1 eigenvalues. In fact, the only set of M + 1 nonnegative integers that satisﬁes the requirement thatthe trace of L , (cid:80) Mm =0 [( M − m ) α + m ] = ( α + 1) M ( M + 1) /

2, equals the sum of the eigenvalues, (cid:80) j ( α + 1) j ,is j ∈ { , , , . . . , M } . Thus, we arrive at the result λ j = ( α + 1) j, j ∈ { , , , . . . , M } , (28)as proposed by inspection in Eqn. 16. B.2.2 State space notation

The linear algebraic manipulations we have done thus far can be cast in the more abstract notation of statespaces, commonly used in quantum mechanics [35]. We will ﬁnd this notation useful in later sections, forexample in transforming between the molecule number basis and the eigenbasis. Speciﬁcally, we introducea state | p (cid:105) that can be projected into (cid:104) m | space to give the probability distribution, or into (cid:104) x | space to givethe generating function: (cid:104) m | p (cid:105) = p m , (cid:104) x | p (cid:105) = G ( x ) . (29)In the same way, the j th eigenstate | j (cid:105) is projected into (cid:104) m | space to give the j th eigenvector, or into (cid:104) x | space to give the j th eigenfunction: (cid:104) m | j (cid:105) = φ jm , (cid:104) x | j (cid:105) = φ j ( x ) . (30)This notation oﬀers new insight into our deﬁnition of the generating function. For example, Eqn. 17 cannow be written (cid:104) x | p (cid:105) = M (cid:88) m =0 (cid:104) x | m (cid:105)(cid:104) m | p (cid:105) , (31)where we have recognized (cid:104) x | m (cid:105) = x m (32)as the projection of the state | m (cid:105) into (cid:104) x | space. Eqn. 31 has a clear interpretation: we have inserted acomplete set of | m (cid:105) states. Similarly, Eqn. 18 can now be written (cid:104) m | p (cid:105) = (cid:73) ¯ dx G ( x ) x m +1 = (cid:73) ¯ dx (cid:104) m | x (cid:105)(cid:104) x | p (cid:105) . (33)15n the ﬁrst step, we have rewritten Eqn. 18 using Cauchy’s theorem, where ¯ dx ≡ dx/ πi , and the contoursurrounds the pole at x = 0. In the second step, we have recognized (cid:104) m | x (cid:105) = 1 x m +1 (34)as the conjugate to (cid:104) x | m (cid:105) . Eqn. 33 has the clear interpretation of inserting a complete set of | x (cid:105) states, underan inner product deﬁned by the complex integration. The choice of inner product and of conjugate stateare made such that orthonormality is preserved, a fact which we may conﬁrm by again employing Cauchy’stheorem: (cid:104) m | m (cid:48) (cid:105) = (cid:73) ¯ dx (cid:104) m | x (cid:105)(cid:104) x | m (cid:48) (cid:105) = (cid:73) ¯ dx x m (cid:48) x m +1 = 1 m ! ∂ mx (cid:104) x m (cid:48) (cid:105) x =0 θ ( m >

0) = δ mm (cid:48) . (35)Finally, the dynamics in Eqn. 19 can be written in state space as | ˙ p (cid:105) = − ˆ L| p (cid:105) = − (ˆ a + − α ˆ a + + 1)ˆ a − − αM ] | p (cid:105) , (36)where we have deﬁned the operators ˆ a + and ˆ a − whose projections in x space are (cid:104) x | ˆ a + = x and (cid:104) x | ˆ a − = ∂ x .These are analogous to the raising and lowering operators in the well known treatment of the quantumharmonic oscillator. This operator formalism for the generating function was ﬁrst developed in the 1970s;for a review see [36]. B.2.3 Eigenvectors

The state space notation facilitates a derivation of the functional form of the eigenvectors: φ jm = (cid:104) m | j (cid:105) = (cid:73) ¯ dx (cid:104) m | x (cid:105)(cid:104) x | j (cid:105) = (cid:73) ¯ dx x m +1 ( x − j ( αx + 1) M − j ( α + 1) M . (37)Here we have inserted the eigenfunctions φ j ( x ) = (cid:104) x | j (cid:105) = ( x − j ( αx + 1) M − j ( α + 1) M (38)from Eqn. 20, with eigenvalues given by Eqn. 28. We use Cauchy’s theorem to perform the integration andrecognize that derivatives of a product follow a binomial expansion: φ jm = 1( α + 1) M m ! ∂ mx (cid:2) ( αx + 1) M − j ( x − j (cid:3) x =0 (39)= 1( α + 1) M m ! m (cid:88) (cid:96) =0 (cid:18) m(cid:96) (cid:19) ∂ (cid:96)x (cid:2) ( αx + 1) M − j (cid:3) x =0 ∂ m − (cid:96)x (cid:2) ( x − j (cid:3) x =0 (40)= 1( α + 1) M m ! m (cid:88) (cid:96) =0 m !( m − (cid:96) )! (cid:96) ! (cid:20) ( M − j )! α (cid:96) ( M − j − (cid:96) )! θ ( (cid:96) ≤ M − j ) (cid:21) (cid:20) j !( − j − m + (cid:96) ( j − m + (cid:96) )! θ ( m − (cid:96) ≤ j ) (cid:21) (41)= ( − j − m ( α + 1) M (cid:88) (cid:96) ∈ Ω (cid:18) M − j(cid:96) (cid:19)(cid:18) jm − (cid:96) (cid:19) ( − α ) (cid:96) . (42)Here the domain Ω results from the derivatives and is deﬁned by max(0 , m − j ) ≤ (cid:96) ≤ min( m, M − j ). Eqn.42 gives the expression for the eigenvectors. For j = 0 the expression reduces to the binomial distributionin terms of the occupancy q = α/ ( α + 1), as it must, since this is the steady state of the uncoupled process: φ m = (cid:18) Mm (cid:19) α m ( α + 1) M = (cid:18) Mm (cid:19) q m (1 − q ) M − m . (43)This function has one maximum, and in general the j th eigenvector has j +1 extrema, making the eigenvectorsqualitatively similar to Fourier modes or eigenfunctions of the quantum harmonic oscillator.16he switch operator ˆ L is not Hermitian. A consequence is that its conjugate eigenvectors ψ jm = (cid:104) j | m (cid:105) (row vectors) are not complex conjugates of its eigenvectors φ jm = (cid:104) m | j (cid:105) (column vectors). Rather, theyare distinct functions that must be constructed to obey an orthonormality relation in order to constitute acomplete basis. The orthonormality relation can be used to derive their form in x space, ψ j ( x ) = (cid:104) j | x (cid:105) : δ jj (cid:48) = (cid:104) j | j (cid:48) (cid:105) = (cid:73) ¯ dx (cid:104) j | x (cid:105)(cid:104) x | j (cid:48) (cid:105) = (cid:73) ¯ dx ψ j ( x ) ( x − j (cid:48) ( αx + 1) M − j (cid:48) ( α + 1) M = (cid:73) ¯ dz z j (cid:48) f j ( z ) . (44)Here we have deﬁned z ≡ ( x − / ( αx + 1) and f j ( z ) ≡ ψ j ( x )( αx + 1) M +2 / ( α + 1) M +1 in order to draw anequivalence between Eqn. 44 and Eqn. 35, which then implies f j ( z ) = 1 /z j +10 = ( αx + 1) j +1 / ( x − j +1 , or ψ j ( x ) = ( α + 1) M +1 ( αx + 1) M − j +1 ( x − j +1 . (45)Eqn. 45 gives the form of the conjugate eigenfunctions in x space, which can be used to derive the expressionfor the conjugate eigenvectors as in Eqns. 37-42: ψ jm = (cid:104) j | m (cid:105) = (cid:73) ¯ dx (cid:104) j | x (cid:105)(cid:104) x | m (cid:105) = (cid:73) ¯ dx ( α + 1) M +1 ( αx + 1) M − j +1 ( x − j +1 x m (46)= (cid:88) (cid:96) ∈ Ω (cid:18) M − j + (cid:96)(cid:96) (cid:19)(cid:18) mj − (cid:96) (cid:19) ( − α ) (cid:96) ( α + 1) j − (cid:96) . (47)Here Ω is deﬁned by max(0 , j − m ) ≤ (cid:96) ≤ j . Eqn. 47 gives the expression for the conjugate eigenvectors.They are j th order polynomials in m . B.3 Expanding the coupled problem in uncoupled eigenfunctions

We now solve the CME by expanding the solution in the eigenfunctions of the uncoupled operator. Thisprocedure is most easily done in state space, in which the CME reads | ˙ p (cid:105) = − [ ˆ L x ( α ) + γ ˆ L xy ] | p (cid:105) (48)where ˆ L x ( α ) = (ˆ a + x − α ˆ a + x + 1)ˆ a − x − αM ] , (49)ˆ L xy = (ˆ a + y − β x ˆ a + y + 1)ˆ a − y − ˆ β x N ] , (50)as in Eqn. 36, and we have introduced the operator ˆ β x whose action on the state | m (cid:105) yields the couplingfunction, ˆ β x | m (cid:105) = β m | m (cid:105) . The ﬁrst step is to write the full operator as two uncoupled operators plusa correction term. Introducing the constant ¯ β to parameterize the second uncoupled operator, the CMEbecomes | ˙ p (cid:105) = − [ ˆ L x ( α ) + γ ˆ L y ( ¯ β ) + γ ˆΓ x ˆ∆ y ] | p (cid:105) (51)where we have explicitly denoted the fact that the correction term ˆ L xy − ˆ L y ( ¯ β ) factorizes into two operatorsthat act on each of the x and y sectors alone:ˆΓ x ≡ ˆ β x − ¯ β, (52)ˆ∆ y ≡ (ˆ a + y − a + y ˆ a − y − N ) . (53)The second step is to expand the solution in the eigenfunctions of the two uncoupled operators. Introducing k as the mode index for the eigenstates of ˆ L y ( ¯ β ), we write | p (cid:105) = M (cid:88) j =0 N (cid:88) k =0 G jk | j, k (cid:105) . (54)17nserting this form into the CME, projecting with the conjugate state (cid:104) j (cid:48) , k (cid:48) | , and summing over j and k yields the dynamics for the expansion coeﬃcients G jk :˙ G jk = − [( α + 1) j + γ ( ¯ β + 1) k ] G jk − γ M (cid:88) j (cid:48) =0 Γ jj (cid:48) N (cid:88) k (cid:48) =0 ∆ kk (cid:48) G j (cid:48) k (cid:48) . (55)Here the ﬁrst term is diagonal and reﬂects the actions of the uncoupled operators on their eigenstates. Thesecond term contains the corrections Γ jj (cid:48) = (cid:104) j | ˆΓ x | j (cid:48) (cid:105) and ∆ kk (cid:48) = (cid:104) k | ˆ∆ y | k (cid:48) (cid:105) . The ﬁrst correction is directlyevaluated by inserting a complete set of m states:Γ jj (cid:48) = M (cid:88) m =0 (cid:104) j | ( ˆ β x − ¯ β ) | m (cid:105)(cid:104) m | j (cid:48) (cid:105) = M (cid:88) m =0 (cid:104) j | m (cid:105) ( β m − ¯ β ) (cid:104) m | j (cid:48) (cid:105) (56)= M (cid:88) m =0 ψ jm ( β m − ¯ β ) φ j (cid:48) m . (57)We see that Γ jj (cid:48) is the simply the diﬀerence between the coupling function and the constant parameter,rotated into eigenspace. Notably, for linear coupling, Γ jj (cid:48) is tridiagonal (see Sec. B.5). The second correctionis most easily evaluated by inserting a complete set of y states; the result, derived in Sec. B.5, is∆ kk (cid:48) = kδ kk (cid:48) − ( N − k + 1) δ k − ,k (cid:48) . (58)We see that ∆ kk (cid:48) is subdiagonal in k , which simpliﬁes the dynamics of G jk to˙ G jk = − M (cid:88) j =0 Λ kjj (cid:48) G j (cid:48) k + γ ( N − k + 1) M (cid:88) j =0 Γ jj (cid:48) G j (cid:48) ,k − , (59)where we deﬁne the matrix acting on the diagonal part asΛ kjj (cid:48) ≡ [( α + 1) j + γ ( ¯ β + 1) k ] δ jj (cid:48) + γk Γ jj (cid:48) . (60)The subdiagonality allows one to write the steady state of Eqn. 59 as an iterative scheme, by which the k thcolumn of G jk is computed from the ( k − (cid:126)G k = γ ( N − k + 1) Λ − k Γ (cid:126)G k − . (61)The scheme is initialized with (cid:126)G = δ j (62)(see Sec. B.5), and the joint distribution is recovered via p mn = M (cid:88) j =0 N (cid:88) k =0 G jk φ jm φ kn , (63)which is the projection of Eqn. 54 into (cid:104) m, n | space.Eqn. 63 constitutes an exact steady state solution to the CME, with G jk computed iteratively via Eqns.61 and 62, auxiliary matrices deﬁned in Eqns. 57 and 60, and the eigenvectors given by Eqns. 42 and 47.Importantly, the computational complexity of the solution has been dramatically reduced: rather than solvingthe original CME (Eqn. 7), which requires inverting its operator of size ( M + 1)( N + 1) × ( M + 1)( N + 1),Eqn. 61 makes clear that it is only necessary to invert N smaller matrices of size ( M + 1) × ( M + 1), i.e. thematrices Λ k for k ∈ { , , . . . , N } . 18 .4 Exact expressions for moments Now that we have an exact solution to the CME in terms of a spectral expansion, moments take an exactform in terms of the expansion coeﬃcients. We thus circumvent the problem of moment closure, insteadarriving at compact expressions that require only the inversion and multiplication of ﬁnite matrices via Eqn.61.Moments are most easily computed from the generating function, G ( x, y ). For example, the ν th moment ofthe output is (cid:104) n ν (cid:105) = [( y∂ y ) ν G ( x = 1 , y )] y =1 . (64)In terms of the expansion, the generating function is G ( x, y ) = (cid:104) x, y | p (cid:105) = (cid:80) Mj =0 (cid:80) Nk =0 G jk (cid:104) x | j (cid:105)(cid:104) y | k (cid:105) , andusing the fact that (cid:104) x = 1 | j (cid:105) = δ j (Eqn. 38), we have (cid:104) n ν (cid:105) = N (cid:88) k =0 G k [( y∂ y ) ν (cid:104) y | k (cid:105) ] y =1 . (65)Inserting the expression for (cid:104) y | k (cid:105) (Eqn. 38) and deﬁning w ≡ log y , we obtain (cid:104) n ν (cid:105) = N (cid:88) k =0 G k ∂ νw (cid:20) ( e w − k ( ¯ βe w + 1) N − k ( ¯ β + 1) N (cid:21) w =0 . (66)At this point we recall that ¯ β is a constant we introduce to parameterize the expansion. The expressionfor the moments therefore cannot depend on ¯ β : if we change ¯ β , the expression in brackets changes, but theexpansion coeﬃcients G k also change, such that Eqn. 66 evaluates to the same ¯ β -independent form. Weare therefore free to set ¯ β to any value, and the choice ¯ β = 0 makes the derivative easiest to evaluate. Thuswe have (cid:104) n ν (cid:105) = N (cid:88) k =0 G k ∂ νw (cid:2) ( e w − k (cid:3) w =0 , (67)where it is now understood that G k is computed with ¯ β = 0. Evaluating the derivative yields (cid:104) n ν (cid:105) = N (cid:88) k =0 G k  min( k,ν ) (cid:88) (cid:96) =1 (cid:26) ν(cid:96) (cid:27) k !( k − (cid:96) )! e (cid:96)w ( e w − k − (cid:96)  w =0 (68)= N (cid:88) k =0 G k min( k,ν ) (cid:88) (cid:96) =1 (cid:26) ν(cid:96) (cid:27) k !( k − (cid:96) )! δ k(cid:96) (69)= min( ν,N ) (cid:88) k =1 G k (cid:26) νk (cid:27) k ! (70)in terms of the Stirling numbers of the second kind, (cid:26) νk (cid:27) = 1 k ! k (cid:88) (cid:96) =0 ( − k − (cid:96) (cid:18) k(cid:96) (cid:19) (cid:96) ν . (71)For example, the ﬁrst moment, second moment, and variance are (cid:104) n (cid:105) = G , (72) (cid:104) n (cid:105) = G + 2 G , (73) σ n = (cid:104) n (cid:105) − (cid:104) n (cid:105) = G + 2 G − G . (74)19hese are exact expressions for the moments in terms of the expansion coeﬃcients G k , which are obtainedby matrix inversion and multiplication via Eqn. 61, e.g. in Mathematica.An informative special case is immediately revealed when N = 1, for which G does not exist, i.e. (cid:104) n (cid:105) = G and σ n = G − G , or σ n = (cid:104) n (cid:105) (1 − (cid:104) n (cid:105) ) ( N = 1) . (75)Here there is only one output molecule. The relationship between its mean activation and the associatednoise must therefore obey the known result for a single binary switch, Eqn. 75. B.5 Auxiliary calculations

Here we show that Γ jj (cid:48) is tridiagonal for linear β m = cm :Γ jj (cid:48) = (cid:104) j | ˆΓ x | j (cid:48) (cid:105) (76)= (cid:104) j | ( c ˆ a + x ˆ a − x − ¯ β ) | j (cid:48) (cid:105) (77)= − ¯ βδ jj (cid:48) + c (cid:73) ¯ dx (cid:104) j | x (cid:105)(cid:104) x | ˆ a + x ˆ a − x | j (cid:48) (cid:105) (78)= − ¯ βδ jj (cid:48) + c (cid:73) ¯ dx (cid:104) j | x (cid:105) x∂ x (cid:104) x | j (cid:48) (cid:105) (79)= − ¯ βδ jj (cid:48) + c (cid:73) ¯ dx (cid:104) j | x (cid:105) x∂ x ( x − j (cid:48) ( αx + 1) M − j (cid:48) ( α + 1) M (80)= − ¯ βδ jj (cid:48) + c (cid:73) ¯ dx (cid:104) j | x (cid:105) x ( α + 1) M (cid:104) j (cid:48) ( x − j (cid:48) − ( αx + 1) M − j (cid:48) +( x − j (cid:48) ( M − j (cid:48) )( αx + 1) M − j (cid:48) − α (cid:105) (81)= − ¯ βδ jj (cid:48) + c (cid:73) ¯ dx (cid:104) j | x (cid:105) x ( x − j (cid:48) − ( αx + 1) M − j (cid:48) − ( α + 1) M [ j (cid:48) ( αx + 1) + ( x − M − j (cid:48) ) α ] (82)= − ¯ βδ jj (cid:48) + cα + 1 (cid:73) ¯ dx (cid:104) j | x (cid:105) x ( x − j (cid:48) − ( αx + 1) M − j (cid:48) − ( α + 1) M (cid:8) j (cid:48) ( αx + 1) +[ α ( M − j (cid:48) ) + j (cid:48) ]( αx + 1)( x − α ( M − j (cid:48) )( x − (cid:9) (83)= − ¯ βδ jj (cid:48) + cα + 1 (cid:73) ¯ dx (cid:104) j | x (cid:105) (cid:40) ( x − j (cid:48) − ( αx + 1) M − j (cid:48) +1 ( α + 1) M [ j (cid:48) ]+ ( x − j (cid:48) ( αx + 1) M − j (cid:48) ( α + 1) M [ α ( M − j (cid:48) ) + j (cid:48) ]+ ( x − j (cid:48) − ( αx + 1) M − j (cid:48) +1 ( α + 1) M [ α ( M − j (cid:48) )] (cid:41) (84)= − ¯ βδ jj (cid:48) + cα + 1 (cid:73) ¯ dx (cid:104) j | x (cid:105) {(cid:104) x | j (cid:48) − (cid:105) j (cid:48) + (cid:104) x | j (cid:48) (cid:105) [ α ( M − j (cid:48) ) + j (cid:48) ] + (cid:104) x | j (cid:48) + 1 (cid:105) α ( M − j (cid:48) ) } (85)= − ¯ βδ jj (cid:48) + cα + 1 {(cid:104) j | j (cid:48) − (cid:105) j (cid:48) + (cid:104) j | j (cid:48) (cid:105) [ α ( M − j (cid:48) ) + j (cid:48) ] + (cid:104) j | j (cid:48) + 1 (cid:105) α ( M − j (cid:48) ) } (86)= cj (cid:48) α + 1 δ j,j (cid:48) − + (cid:26) c [ α ( M − j (cid:48) ) + j (cid:48) ] α + 1 − ¯ β (cid:27) δ jj (cid:48) + cα ( M − j (cid:48) ) α + 1 δ j,j (cid:48) +1 . (87)Eqn. 77 recognizes that ˆ β x = c ˆ a + x ˆ a − x is the operator representation of β m (since ˆ a + ˆ a − is the number operator,i.e. ˆ a + x ˆ a − x | m (cid:105) = m | m (cid:105) ), and Eqn. 83 uses the algebraic fact that x [ j (cid:48) ( αx + 1) + ( x − M − j (cid:48) ) α ]( α + 1) = j (cid:48) ( αx + 1) + [ α ( M − j (cid:48) ) + j (cid:48) ]( αx + 1)( x −

1) + α ( M − j (cid:48) )( x − , which is straightforward to verify.20ere we derive Eqn. 58:∆ kk (cid:48) = (cid:104) k | ˆ∆ y | k (cid:48) (cid:105) (88)= (cid:104) k | (ˆ a + y − a + y ˆ a − y − N ) | k (cid:48) (cid:105) (89)= (cid:73) ¯ dy (cid:104) k | y (cid:105)(cid:104) y | (ˆ a + y − a + y ˆ a − y − N ) | k (cid:48) (cid:105) (90)= (cid:73) ¯ dy (cid:104) k | y (cid:105) ( y − y∂ y − N ) (cid:104) y | k (cid:48) (cid:105) (91)= (cid:73) ¯ dy (cid:104) k | y (cid:105) ( y − y∂ y − N ) ( y − k (cid:48) ( ¯ βy + 1) N − k (cid:48) ( ¯ β + 1) N (92)= (cid:73) ¯ dy (cid:104) k | y (cid:105) ( y − β + 1) N (cid:104) yk (cid:48) ( y − k (cid:48) − ( ¯ βy + 1) N − k (cid:48) + y ( y − k (cid:48) ( N − k (cid:48) )( ¯ βy + 1) N − k (cid:48) − ¯ β − N ( y − k (cid:48) ( ¯ βy + 1) N − k (cid:48) (cid:105) (93)= (cid:73) ¯ dy (cid:104) k | y (cid:105) ( y − k (cid:48) ( ¯ βy + 1) N − k (cid:48) − ( ¯ β + 1) N (cid:2) yk (cid:48) ( ¯ βy + 1) + y ( y − N − k (cid:48) ) ¯ β − N ( y − βy + 1) (cid:3) (94)= (cid:73) ¯ dy (cid:104) k | y (cid:105) ( y − k (cid:48) ( ¯ βy + 1) N − k (cid:48) − ( ¯ β + 1) N (cid:2) k (cid:48) ( ¯ βy + 1) − ( y − N − k (cid:48) ) (cid:3) (95)= (cid:73) ¯ dy (cid:104) k | y (cid:105) (cid:34) k (cid:48) ( y − k (cid:48) ( ¯ βy + 1) N − k (cid:48) ( ¯ β + 1) N − ( N − k (cid:48) ) ( y − k (cid:48) +1 ( ¯ βy + 1) N − ( k (cid:48) +1) ( ¯ β + 1) N (cid:35) (96)= (cid:73) ¯ dy (cid:104) k | y (cid:105) [ k (cid:48) (cid:104) y | k (cid:48) (cid:105) − ( N − k (cid:48) ) (cid:104) y | k (cid:48) + 1 (cid:105) ] (97)= k (cid:48) (cid:104) k | k (cid:48) (cid:105) − ( N − k (cid:48) ) (cid:104) k | k (cid:48) + 1 (cid:105) (98)= k (cid:48) δ kk (cid:48) − ( N − k (cid:48) ) δ k,k (cid:48) +1 (99)= kδ kk (cid:48) − ( N − k + 1) δ k − ,k (cid:48) . (100)Here we derive Eqn. 62: (cid:126)G = G j (101)= (cid:104) j, k = 0 | p (cid:105) (102)= M (cid:88) m =0 N (cid:88) n =0 (cid:104) j | m (cid:105)(cid:104) k = 0 | n (cid:105)(cid:104) m, n | p (cid:105) (103)= M (cid:88) m =0 N (cid:88) n =0 (cid:104) j | m (cid:105) p mn (104)= M (cid:88) m =0 (cid:104) j | m (cid:105) p m (105)= M (cid:88) m =0 (cid:104) j | m (cid:105)(cid:104) m | j = 0 (cid:105) (106)= (cid:104) j | j = 0 (cid:105) (107)= δ j . (108)Eqn. 104 uses Eqn. 47 to obtain (cid:104) k = 0 | n (cid:105) = 1, and Eqn. 106 recognizes that p m is the steady state of theuncoupled operator, p m = φ m = (cid:104) m | j = 0 (cid:105) . 21 Supplementary ﬁgures well mixed perfectly partitioned partitioned with exchange output mean, /M ou t pu t no i s e , (cid:109) m / M well (cid:239) mixedpartitionedpartitioned with exchange output mean, /M ou t pu t no i s e , (cid:109) m / M well (cid:239) mixedperfectly partitionedpartitioned with exchange0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 100.050.10.150.20.250.30.350.4 output mean, /M ou t pu t no i s e , (cid:109) m / M well (cid:239) mixedperfectly partitionedpartitioned with exchange I = 0 .

437 bits I = 0 .

207 bits I = 0 .

318 bits mean input, q = /M = (cid:95) /( (cid:95) +1) m ean ou t pu t, < n > / N A mean output, /N no i s e i n ou t pu t, (cid:109) n2 / N B particle number, M = N i n f o r m a t i on ( b i t s ) C (cid:47) = 1 (cid:47) = 2 (cid:47) = 3 (cid:47) = 4 0 1 2 3 4 5 6 7 8 901234 particle number, M = N op t i m a l pa r t i t i on nu m be r , (cid:47) * M/ (cid:47) * = N/ (cid:47) * (cid:53) D Figure 7: The eﬀects of partitioning persist for Michaelis-Menten coupling. The coupling is described by β ( i ) m = βm i / [ m i + ( V /π ) K ] = βm i / ( m i + φM/π ), where m i is the number of X ∗ molecules in partition i ∈ { , . . . , π } , and φ ≡ KV /M is a constant. Here β = 20, φ = 1 /

2, and γ = 1. A, B

As in Fig. 2 of the main text, with M = N = 2, perfect partitioning linearizes the input-output relationand reduces the noise, transmitting more information than the well-mixed system; further, allowing exchangeamong partitions compresses the response and increases the noise compared to the perfectly partitionedsystem, transmitting less information than the well-mixed system. C, D

As in Fig. 5 of the main text, an information-optimal partition size, here

M/π ∗ = N/π ∗ ≈

2, emergesdue to the trade-oﬀ between optimizing signaling reliability and avoiding unfavorable conﬁgurations.22 input, q = (cid:95) /( (cid:95) +1) = /M ou t pu t, < n > / N A M = 1M = 3M = 10mean field mean output, /N no i s e i n ou t pu t, (cid:109) n2 / N B M = 1M = 3M = 10binomial

Figure 8: Reducing the number of input molecules linearizes the input-output response and increases thenoise in the output. Here π = 1, β = 20, and γ = 1. A The output (the mean activity of N = 2 Y molecules) vs. the input (the mean activity of M X molecules)for several values of M . As M is reduced the response becomes more linear, deviating more strongly fromthe mean-ﬁeld response (cid:104) n (cid:105) /N = βq/ ( βq + 1). Symbols show 20 uniformly spaced values of q to highlightthe eﬀect of saturation on the state space. B The noise vs. the mean for the output, shown for the same values of M . As M is reduced the noiseincreases for all values of the mean. 23 β = 3: M/π ∗ ≈ β = 8: M/π ∗ ≈ β = 200: M/π ∗ ≈ particle number, M = N op t i m a l pa r t i t i on nu m be r , (cid:47) * A γ = 0 . M/π ∗ ≈ γ = 0 . M/π ∗ ≈ γ = 10 . M/π ∗ ≈ particle number, M = N op t i m a l pa r t i t i on nu m be r , (cid:47) * B partition size, M/ (cid:47) = N/ (cid:47) i n f o r m a t i on ( b i t s ) C (cid:96) = 20 (cid:96) = 8 (cid:96) = 4 10 partition size, M/ (cid:47) = N/ (cid:47) i n f o r m a t i on ( b i t s ) D (cid:97) = 1 (cid:97) = 0.2 (cid:97) = 0.05 Figure 9: The emergence of an optimal partition size is robust to parameter variations.

A, B

Results from the minimal system, described by the chemical master equation, as in Fig. 5B of the maintext. The information-optimal partition number π ∗ is plotted as a function of molecule number M = N for various values of β (A) and γ (B). Linear ﬁts provide estimates of the optimal partition size M/π ∗ , asindicated in the legends. In A, γ = 1; in B, β = 20. C, D

Results from the lattice simulation, in which space is accounted for explicitly, as in Fig. 6C of themain text. The information is plotted as a function of the partition size, directly revealing an optimum, forvarious values of β (C) and γ (D). Parameters are as in Fig. 6C: M = N = 49, p hop = 0 . λ = 70, and p D /p r = 1. In C, γ = 1; in D, β = 20.As discussed in the main text, the optimum arises due to a tradeoﬀ between two key eﬀects of partitioning:on the one hand, partitioning removes correlations in the states of Y molecules, reducing noise; on the otherhand, partitioning isolates molecules, reducing the maximal response. The ﬁrst eﬀect favors few moleculesper partition, while the second eﬀect favors many molecules per partition.As seen here in both the minimal system (A, B) and the simulated system (C, D), lowering β or γ increasesthe optimal number of molecules per partition. This result has an intuitive explanation in terms of the abovetradeoﬀ: lowering either β or γ slows the rate of switching from the Y to the Y ∗ state, with respect to thetimescale of X switching. As a result, Y molecules are less sensitive to individual ﬂuctuations in the stateof X molecules. The states of the Y molecules therefore exhibit weaker correlations, which in turn weakensthe beneﬁt that partitioning imparts in terms of the removal of these correlations. The opposing eﬀect ofmolecular isolation thus begins to dominate, pushing the optimum toward a larger number of molecules perpartition. 24 number of X molecules, M nu m be r o f Y m o l e c u l e s , N (cid:47) * Figure 10: The optimal partition size has only a weak dependence on the number of output molecules. Theinformation-optimal partition number π ∗ is plotted as a function of the number of X molecules M and thenumber of Y molecules N . The dependence of π ∗ on N is weak, such that the partition size M/π ∗ ≈ − N values. 25

100 200 300 400 500012345 time i npu t, (cid:95) A probability density (cid:239) partition size, M/ (cid:47) = N/ (cid:47) i n f o r m a t i on ( b i t s ) B c = 0c = 0.1c = 1c = 10 Figure 11: The eﬀects of partitioning are robust to extrinsic noise. A Simulations are performed with extrinsic noise introduced to the input parameter α . To keep α ≥

0, thequantity z ≡ log α is described by the simple mean-reverting Ornstein-Uhlenbeck process dz = r ( µ − z ) dt + η √ rdtξ , where ξ is a Gaussian random variable with mean 0 and variance 1; this results in a log-normaldistribution for α . The quantity 1 /r is the autocorrelation time, and the choices µ = log[¯ α / (¯ α + c )] / η = (cid:112) c/ ¯ α ) ensure that the mean of α is ¯ α and that the variance of α scales with the mean via σ α = c ¯ α . B As the magnitude of the extrinsic noise (set by c ) increases, the information I [¯ α, n ] decreases for allpartition sizes, while the presence of an information-optimal partition size persists.Here M = N = 25, β = 20, γ = 1, p hop = 0 . λ = 50 lattice spacings squared, the ratio ofdiﬀusion to reaction propensities is p D /p r = 1, and r = 1 in units of the X ∗ → X reaction rate (which setsthe timescale of switching). In A, ¯ α = c = 1 and time is scaled by 1 /r . In B, when c = 0, the informationtransmission is lower than that in Fig. 6C of the main text because M = N is lower.26 input, q = (cid:95) /( (cid:95) +1) m ean ou t pu t, < n > / N A mean output, /N no i s e i n ou t pu t, (cid:109) n2 / N p hop = 1p hop = 0.1p hop = 0.01p hop = 0.001 (cid:239) partition size, M/ (cid:47) = N/ (cid:47) i n f o r m a t i on ( b i t s ) input, q = (cid:95) /( (cid:95) +1) m ean ou t pu t, < n > / N B mean output, /N no i s e i n ou t pu t, (cid:109) n2 / N p hop = 1p hop = 0.1p hop = 0.01p hop = 0.001 (cid:239) partition size, M/ (cid:47) = N/ (cid:47) i n f o r m a t i on ( b i t s ) + + Figure 12: The eﬀects of partitioning are robust to receptor dimerization. Two dimerization schemes aresimulated, which are paradigmatic for receptor tyrosine kinases, including EGF receptor [37]: A Dimerizationis receptor-mediated (left, inset), meaning two active receptors X ∗ form a complex C , or B dimerization isligand-mediated (left, inset), meaning an active receptor X ∗ and an inactive receptor X form a complex C .The latter scheme admits a “dead-end” state at ligand saturation, when all receptors are ligand-bound andno complexes can form, leading to a non-monotonic response curve (B, left), as observed e.g. for the Retreceptor [38]. Both schemes are described by the reactions X α − (cid:42)(cid:41) − X ∗ , C + Y γβ −−→ C + Y ∗ , and Y ∗ γ −→ Y ,with dimer formation described by X ∗ + X ∗ χ(cid:15) − (cid:42)(cid:41) − χ C in A, or X ∗ + X χ(cid:15) − (cid:42)(cid:41) − χ C in B. Here M = N = 25, β = 20, χ = γ = 1, the system is λ = 50 lattice spacings squared, and the ratio of diﬀusion to reaction propensitiesis p D /p r = 1. In A, (cid:15) = 20; in B, (cid:15) = 5. Left

As in Fig. 6A of the main text, as the probability of crossing a diﬀusion barrier p hop is decreased,the maximal value of the mean response decreases. In A, the response also becomes more linear, butto less of a degree than in Fig. 6A of the main text. Note that due to both ﬁnite diﬀusion and ﬁnitemolecule number, even the unpartitioned response ( p hop = 1) deviates from the mean-ﬁeld response (blacksolid line), which is given by (cid:104) n (cid:105) /N = βf / (1 + βf ), where f is the fraction of X molecules in the dimerstate; in A, f = (cid:15)g with g ≡ (cid:104) m (cid:105) /M = ( (cid:112) (cid:15)q − / (4 (cid:15)q ), while in B, f = (cid:15)g (1 − g ) / (2 (cid:15)g + 1) with g ≡ (cid:104) m (cid:105) /M = [ (cid:112) (cid:15)q (1 − q ) − / [4 (cid:15) (1 − q )]. Here π = 25. Legends in middle panels apply to left panelsas well. Middle

As in Fig. 6B of the main text, as the probability of crossing a diﬀusion barrier p hop is decreased,the output noise decreases. Black dashed line shows the binomial noise limit σ n /N = ( (cid:104) n (cid:105) /N )(1 − (cid:104) n (cid:105) /N ).In B, lines connecting data points are provided to reveal that, as there are two values of q that give the samemean (cid:104) n (cid:105) /N (left), the noise is higher for the smaller value of q . Here π = 25. Right

As in Fig. 6C of the main text, the tradeoﬀ between reliable signaling (reduced noise) and eﬃcientsignaling (maintaining a high maximal response) leads to an information-optimal partition size. Here p hop =0 . M = N islower and additionally, in B, because of the non-monotonic mean response.27 input, q = /M = (cid:95) /( (cid:95) +1) p r ob . den s i t y , p ( q ) A particle number, M = N i n f o r m a t i on ( b i t s ) (cid:47) = 1 (cid:47) = 2 (cid:47) = 3 (cid:47) = 4 0 1 2 3 4 5 6 7 8 901234 particle number, M = N op t i m a l pa r t i t i on nu m be r , (cid:47) * M/ (cid:47) * = N/ (cid:47) * (cid:53) input, q = /M = (cid:95) /( (cid:95) +1) p r ob . den s i t y , p ( q ) B particle number, M = N i n f o r m a t i on ( b i t s ) (cid:47) = 1 (cid:47) = 2 (cid:47) = 3 (cid:47) = 4 0 1 2 3 4 5 6 7 8 9 1001234 particle number, M = N op t i m a l pa r t i t i on nu m be r , (cid:47) * M/ (cid:47) * = N/ (cid:47) * (cid:53) Figure 13: The eﬀects of partitioning are robust to the shape of the input distribution. As in Fig. 5 of themain text, which takes a uniform input distribution p ( q ), an information-optimal partition size M/π ∗ = N/π ∗ persists with an input distribution that is A unimodal or B bimodal.28

10 20 30 40 50 60 70 80 90 10010 (cid:239) (cid:239) (cid:239) (cid:239) N (cid:95) r e l a t i v e e rr o r well mixed, M = N = 2perfectly partitioned, M = N = (cid:47) = 2partitioned with exchange, M = N = (cid:47) = 2well mixed, M = N = 10perfectly partitioned, M = N = (cid:47) = 10 Figure 14: Computation of the mutual information converges as the input is more ﬁnely discretized. Therelative error | I − I | /I , where I is the information at N α = 100, is plotted against the number N α ofvalues of α [uniformly spaced in q = α/ ( α + 1)] used in the computation. Five conditions are tested, asindicated in the legend. It is seen that the relative error falls below ∼

1% in all conditions for N α (cid:38) eferences [1] Hern´an E Grecco, Malte Schmick, and Philippe I H Bastiaens. Signaling from the living plasma mem-brane. Cell , 144(6):897–909, Mar 2011.[2] Christian Eggeling, Christian Ringemann, Rebecca Medda, G¨unter Schwarzmann, Konrad Sandhoﬀ,Svetlana Polyakova, Vladimir N Belov, Birka Hein, Claas von Middendorﬀ, Andreas Sch¨onle, and Ste-fan W Hell. Direct observation of the nanoscale dynamics of membrane lipids in a living cell.

Nature ,457(7233):1159–62, Feb 2009.[3] Daniel Lingwood and Kai Simons. Lipid rafts as a membrane-organizing principle.

Science ,327(5961):46–50, Jan 2010.[4] A Kusumi and Y Sako. Cell surface organization by the membrane skeleton.

Current Opinion in CellBiology , 8(4):566–74, Aug 1996.[5] Akihiro Kusumi, Yuki M Shirai, Ikuko Koyama-Honda, Kenichi G N Suzuki, and Takahiro K Fuji-wara. Hierarchical organization of the plasma membrane: investigations by single-molecule tracking vs.ﬂuorescence correlation spectroscopy.

FEBS Letters , 584(9):1814–23, May 2010.[6] Boris N Kholodenko, John F Hancock, and Walter Kolch. Signalling ballet in space and time.

Nat RevMol Cell Biol , 11(6):414–26, Jun 2010.[7] Kenichi G N Suzuki, Takahiro K Fujiwara, Fumiyuki Sanematsu, Ryota Iino, Michael Edidin, andAkihiro Kusumi. GPI-anchored receptor clusters transiently recruit Lyn and G alpha for temporarycluster immobilization and Lyn activation: single-molecule tracking study 1.

The Journal of Cell Biology ,177(4):717–30, May 2007.[8] Kenichi G N Suzuki, Takahiro K Fujiwara, Michael Edidin, and Akihiro Kusumi. Dynamic recruitmentof phospholipase C gamma at transiently immobilized GPI-anchored receptor clusters induces IP3-Ca2+signaling: single-molecule tracking study 2.

The Journal of Cell Biology , 177(4):731–42, May 2007.[9] Ian A Prior, Cornelia Muncke, Robert G Parton, and John F Hancock. Direct visualization of Rasproteins in spatially distinct cell surface microdomains.

The Journal of Cell Biology , 160(2):165–70,Jan 2003.[10] S. Plowman, C. Muncke, R. G. Parton, and J. F. Hancock. H-ras, K-ras, and inner plasma membraneraft proteins operate in nanoclusters with diﬀerential dependence on the actin cytoskeleton.

Proc. NatlAcad. Sci. USA , 102:15500–15005, 2005.[11] Ziya Kalay, Takahiro K Fujiwara, and Akihiro Kusumi. Conﬁning domains lead to reaction bursts:reaction kinetics in the plasma membrane.

PLoS ONE , 7(3):e32948, Jan 2012.[12] Andrew Mugler, Aimee Gotway Bailey, Koichi Takahashi, and Pieter Rein ten Wolde. Membraneclustering and the role of rebinding in biochemical signaling.

Biophysical Journal , 102(5):1069–78, Mar2012.[13] T. Tian, A. Harding, K. Inder, S. Plowman, R. G. Parton, and J. F. Hancock. Plasma membranenanoswitches generate high-ﬁdelity Ras signal transduction.

Nat. Cell Biol. , 9:905–914, 2007.[14] Thomas Gurry, Ozan Kahramano˘gullari, and Robert G Endres. Biophysical mechanism for Ras-nanocluster formation and signaling in plasma membrane.

PLoS ONE , 4(7):e6148, Jan 2009.[15] A M Walczak, A Mugler, and C H Wiggins. A stochastic spectral analysis of transcriptional regulatorycascades.

Proc Natl Acad Sci USA , 106:6529–6534, 2009.[16] A Mugler, A M Walczak, and C H Wiggins. Spectral solutions to stochastic models of gene expressionwith bursts and regulation.

Phys Rev E , 80:041921, 2009.3017] J. L. W. V. Jensen. Sur les fonctions convexes et les in´egalit´es entre les valeurs moyennes.

ActaMathematica , 30(1):175–193, 1906.[18] C. E. Shannon. A Mathematical Theory of Communication.

Bell Syst Tech J , 27:379–423, 623–656,1948.[19] Kotono Murase, Takahiro Fujiwara, Yasuhiro Umemura, Kenichi Suzuki, Ryota Iino, Hidetoshi Ya-mashita, Mihoko Saito, Hideji Murakoshi, Ken Ritchie, and Akihiro Kusumi. Ultraﬁne membranecompartments for molecular diﬀusion as revealed by single molecule techniques.

Biophysical Journal ,86(6):4075–93, Jun 2004.[20] Boryana N Manz, Bryan L Jackson, Rebecca S Petit, Michael L Dustin, and Jay Groves. T-cell triggeringthresholds are modulated by the number of antigen within individual T-cell receptor clusters.

Proc NatlAcad Sci USA , 108(22):9089–94, May 2011.[21] Thorsten Erdmann, Martin Howard, and Pieter Rein ten Wolde. Role of spatial averaging in theprecision of gene expression patterns.

Phys. Rev. Lett. , 103:258101, Dec 2009.[22] Raymond Cheong, Alex Rhee, Chiaochun Joanne Wang, Ilya Nemenman, and Andre Levchenko. Infor-mation transduction capacity of noisy biochemical signaling networks.

Science , 334(6054):354–8, Oct2011.[23] Gasper Tkacik, Curtis G Callan, and William Bialek. Information ﬂow and optimization in transcrip-tional regulation.

Proceedings of the National Academy of Sciences , 105(34):12265–70, Aug 2008.[24] Gerardo Aquino, Diana Clausznitzer, Sylvain Tollis, and Robert G Endres. Optimal receptor-clustersize determined by intrinsic and extrinsic noise.

Physical Review E , 83:21914, Feb 2011.[25] Monica Skoge, Yigal Meir, and Ned S Wingreen. Dynamics of cooperativity in chemical sensing amongcell-surface receptors.

Physical Review Letters , 107:178101, Oct 2011.[26] Koichi Takahashi, Sorin Tanase-Nicola, and Pieter Rein ten Wolde. Spatio-temporal correlations candrastically change the response of a MAPK pathway.

Proc Natl Acad Sci USA , 107(6):2473–8, Feb2010.[27] Jeroen S van Zon, David K Lubensky, Pim R H Altena, and Pieter Rein ten Wolde. An allosteric modelof circadian KaiC phosphorylation.

Proc Natl Acad Sci USA , 104(18):7420–5, May 2007.[28] Paul Miller, Anatol M Zhabotinsky, John E Lisman, and Xiao-Jing Wang. The stability of a stochasticCaMKII switch: dependence on the number of enzyme molecules and protein turnover.

Plos Biol ,3(4):e107, Apr 2005.[29] P Nash, X Tang, S Orlicky, Q Chen, F B Gertler, M D Mendenhall, F Sicheri, T Pawson, and M Tyers.Multisite phosphorylation of a CDK inhibitor sets a threshold for the onset of DNA replication.

Nature ,414(6863):514–21, Nov 2001.[30] A. Zeke, M. Luk´acs, W. A. Lim, and A. Rem´enyi. Scaﬀolds: interaction platforms for cellular signallingcircuits.

Trends Cell Biol. , 19:364–374, 2009.[31] A Mugler, A M Walczak, and C H Wiggins. Information-optimal transcriptional response to oscillatorydriving.

Phys Rev Lett , 105:058101, 2010.[32] A M Walczak, A Mugler, and C H Wiggins. Analytic methods for modeling stochastic regulatorynetworks. In X Liu and M Betterton, editors,

Methods in Molecular Biology, Vol. 880: ComputationalModeling of Signaling Networks . Humana Press, 2012.[33] N G van Kampen.

Stochastic processes in physics and chemistry . Elsevier Science, 2nd edition, 1992.[34] S J Farlow.

Partial diﬀerential equations for scientists and engineers . Dover Publications, 1993.[35] J S Townsend.

A modern approach to quantum mechanics . University Science Books, 2000.3136] D C Mattis and M L Glasser. The uses of quantum ﬁeld theory in diﬀusion-limited reactions.

Rev ModPhys , 70:979–1001, 1998.[37] M A Lemmon and J Schlessinger. Cell signaling by receptor tyrosine kinases.

Cell , 141:1117–34, 2010.[38] S Schlee, P Carmillo, and A Whitty. Quantitative analysis of the activation mechanism of the multi-component growth-factor receptor Ret.