[PDF] Self-assembly of biomolecular condensates with shared components

Abstract

Biomolecular condensates self-assemble when proteins and nucleic acids spontaneously demix to form droplets within the crowded intracellular milieu. This simple mechanism underlies the formation of a wide variety of membraneless compartments in living cells. To understand how multiple condensates with distinct compositions can self-assemble in such a heterogeneous system, we study a minimal model in which we can "program" the pairwise interactions among hundreds of species. We show that the number of distinct condensates that can be reliably assembled grows superlinearly with the number of species in the mixture when the condensates are allowed to share components. Furthermore, we show that we can predict the maximum number of distinct condensates in a mixture without knowing the details of the pairwise interactions. Simulations of condensate growth confirm these predictions and suggest that the physical rules governing the achievable complexity of condensate-mediated spatial organization are broadly applicable to biomolecular mixtures.

Full PDF

SSelf-assembly of biomolecular condensates with shared components

William M. Jacobs

Department of Chemistry, Princeton University, Princeton, NJ 08544, USA (Dated: February 26, 2021)Biomolecular condensates self-assemble when proteins and nucleic acids spontaneously demix toform droplets within the crowded intracellular milieu. This simple mechanism underlies the forma-tion of a wide variety of membraneless compartments in living cells. To understand how multiplecondensates with distinct compositions can self-assemble in such a heterogeneous system, we studya minimal model in which we can “program” the pairwise interactions among hundreds of species.We show that the number of distinct condensates that can be reliably assembled grows superlinearlywith the number of species in the mixture when the condensates are allowed to share components.Furthermore, we show that we can predict the maximum number of distinct condensates in a mix-ture without knowing the details of the pairwise interactions. Simulations of condensate growthconﬁrm these predictions and suggest that the physical rules governing the achievable complexityof condensate-mediated spatial organization are broadly applicable to biomolecular mixtures.

Many proteins and nucleic acids are spatially or-ganized into biomolecular condensates within livingcells [1]. Condensates behave like liquid droplets, typ-ically assembling via nucleation and growth while ex-hibiting a ﬁnite surface tension at the condensate–cytoplasm/nucleoplasm interface [2]. Importantly, theassembly of these structures is spontaneous, driven bynet attractive interactions between disordered polypep-tides [3, 4] and by speciﬁc binding interactions amongprotein, DNA/RNA, and various client molecules [5, 6].Over the past decade, a large number of distinct typesof condensates have been characterized. Many of theseexamples assemble only under speciﬁc conditions and ap-pear to be involved in a variety of key biological func-tions [1, 7], including stress response [5], intracellularsignaling [8], and transcriptional control [9]. As such,it is important to understand how the self-assembly of alarge number of condensates can be coordinated in timeand space within a single intracellular compartment.Previous eﬀorts to describe spontaneous condensateassembly using simple theoretical models have providedinsight into the typical behavior that can occur in bi-ological mixtures with hundreds or thousands of com-ponents [10, 11]. However, these approaches have as-sumed a null model in which there is no inherent struc-ture to the interactions among the various molecules inthe mixture. This is a poor assumption when consider-ing biopolymer mixtures that have presumably evolvedto assemble functional condensates. For example, anintriguing situation is found in the nucleus, where pu-tative transcriptional condensates self-assemble (or canbe made to assemble [12]) at speciﬁc target loci on thegenome and share components while remaining mutuallyimmiscible [9, 13]. If these structures form by a near-equilibrium mechanism akin to liquid–liquid phase sepa-ration, how many such structures can be simultaneouslyencoded by the intermolecular interactions and reliablyassembled under spatiotemporal control?To address this question, we consider the inverse prob-lem of designing component-wise interactions to encodea set of target condensates. In this way, we can deter- metastable, homogeneous mixture(a)(b) α = 1 α = 2 α = 3 . . . α = n ( (cid:126)φ − (cid:126)φ (0) ) · ( (cid:126)x ( α ) − (cid:126)x (0) )phase 0 phase α ∆Ω α Ω ( (cid:126) φ ) (cid:126)φ ⊥ FIG. 1. Encoding compositionally distinct droplets in a mul-ticomponent mixture. (a) We aim to encode n target phases,each enriched in a speciﬁc subset of the components, that canbe selectively assembled from a homogeneous mixture via nu-cleation and growth. (b) To achieve this, each target phasemust be a local minimum of the grand potential, Ω. We sketchΩ along a linear path between the homogeneous and target-phase concentrations, (cid:126)φ (0) and (cid:126)φ ( α ) ; orthogonal directions ofconcentration space, (cid:126)φ ⊥ , are indicated by dashed curves. mine the rules governing the assembly of condensateswith shared components. We specify this problem byproposing a set of n target phases, each enriched in a spe-ciﬁc subset of the N species in the mixture (Figure 1a).Deﬁning the component volume fractions (cid:126)φ , the total vol-ume fraction of all components φ T ≡ (cid:80) Ni =1 φ i , and thecompositions x i ≡ φ i /φ T , we say that component i is en-riched in phase α if x ( α ) i > x (0) i and depleted otherwise,where the label 0 indicates the homogeneous phase. Wealso specify the composition of the homogeneous phase,which has a total volume fraction of φ (0)T . These choicesestablish the target phase behavior of the mixture. a r X i v : . [ c ond - m a t . s t a t - m ec h ] F e b In order for the target phases to self-assemble, eachphase must be supersaturated with respect to the ho-mogeneous phase and stable with respect to compositionﬂuctuations. These requirements imply that each phaseis a local minimum of the dimensionless grand poten-tial, Ω( (cid:126)φ ) ≡ F ( (cid:126)φ ) − (cid:80) i [ ∂F ( (cid:126)φ (0) ) /∂φ i ] φ i , where F is thedimensionless Helmholtz free energy (Figure 1b). Thethermodynamic driving force for assembling each targetphase, ∆Ω ( α ) , can be tuned by adjusting the volumefractions in the metastable mixture, (cid:126)φ (0) . Making thesimplifying assumption that all interactions are pairwise(i.e., lacking any orientational dependence or multibodyterms), we invoke the mean-ﬁeld regular solution model, F = N (cid:88) i =1 φ i log φ i + (1 − φ T ) log(1 − φ T ) − N (cid:88) i =1 N (cid:88) j =1 χ ij φ i φ j . Our aim is thus to identify a symmetric component-wise interaction matrix, χ , that yields that target phasebehavior. Later, we shall show that our results translateto models with ﬁnite-range interactions.Within the mean-ﬁeld model, our inverse design prob-lem requires that the following three conditions be met: x ( α ) i ≤ ζN M α if species i is depleted in phase α (1) (cid:88) i,j (cid:104) φ ( α ) i φ ( α ) j − δ ik (cid:16) φ ( α ) j − φ (0) j (cid:17) − φ (0) i φ (0) j (cid:105) χ ij = − log φ ( α ) k φ (0) k (2) ∂ F∂φ i ∂φ j (cid:12)(cid:12)(cid:12)(cid:12) α = δ ij φ ( α ) i + w ( χ ) − χ ij (cid:31) , (3)where M α is the number of components enriched in phase α and w ( χ ) ≡ exp[ (cid:80) i,j χ ij ( φ ( α ) i φ ( α ) j − φ (0) i φ (0) j )] / (1 − φ (0)T ).The ﬁrst condition speciﬁes that while we do not pre-scribe the precise compositions of the depleted compo-nents, they should nonetheless comprise a negligible vol-ume fraction of a target phase. The constant ζ musttherefore be less than unity, as discussed below. The sec-ond condition ensures that all phases are extrema of thegrand potential. We implicitly enforce ∆Ω ( α ) ≈ φ ( α )T (cid:39)

1; thisapproximation allows us to replace φ ( α ) i with x ( α ) i every-where and to neglect terms involving x ( α ) i from depletedcomponents on the left-hand side of (2). Finally, thethird condition requires that ∂ F/∂φ i ∂φ j | α be positivedeﬁnite to ensure the thermodynamic stability of the α phase. Because w ( χ ) is positive and appears in all entriesof this matrix, it negligibly aﬀects the minimum eigen-value. Thus, for practical purposes, all three conditionsare linear in χ . Lastly, we assume for simplicity thatthe components do not interact with themselves, suchthat χ ii = 0, since such interactions equally stabilize allphases in which the i th component is enriched.Even if a suitable interaction matrix exists for a setof target phases, it is typically not unique. We thereforepick out the “best” solution, which maximizes the stabil-ity of the least stable phase, by minimizing the objective function L SDP ≡ − min α =0 ,...,n (cid:26) min i =1 ,...,N (cid:104) λ ( α ) i ( χ ) (cid:105)(cid:27) + κ Var i L SDP , subject to constraints (1) and (2), is aconvex optimization problem that can be solved numeri-cally using eﬃcient semi-deﬁnite programming (SDP) al-gorithms [14, 15]. Importantly, if L SDP <

0, then a solu-tion exists to the mean-ﬁeld inverse problem.We can also devise a simpler optimization problem byapproximating the minimum eigenvalue of ∂ F using ran-dom matrix theory [16]. If the partition coeﬃcient for de-pleted components, ζ/M α , is suﬃciently small, then wecan approximate the SDP by the quadratic programming(QP) objective function L QP ≡ (cid:0) φ (0)T (cid:1) N Var i

010 10 20 30 40 50 60 70 80 p r o b a b ili t y o f s u cce ss number of targets n

010 1 2 p r o b a b ili t y n/n / N = 406080100 M = 520 40 60 80 100 M = 5 n / number of components N e x p o n e n t γ target size M numericspredictionSDPQP n / = N / M n / (a) N = 60, M = 5 n = N / M (b) (c) n / ∼ N γ FIG. 2. Mean-ﬁeld predictions of optimal target encoding.(a) The probability that a set of n randomly generated targetscan be stably encoded in the mean-ﬁeld model. Inset : Theprobability of successful encoding roughly follows a mastercurve when scaled by n / , the number of targets that canbe encoded with 50% probability. (b) For N (cid:29) M , numericalcalculations of n / obtained via quadratic programming (QP)agree well with those obtained by semi-deﬁnite programming(SDP), showing that n / follows a power law as a functionof the number of components. (c) The scaling exponent γ increases with the number of species enriched in each target, M , as predicted by graph-theoretical arguments (see text). components per target in order to predict whether a suit-able interaction matrix is likely to exist. This predictionprovides a useful, albeit probabilistic, upper bound onthe complexity of the phase behavior that can be achievedusing pairwise interactions and shared components.We test our predictions by generating random sets oftarget phases, each enriched in M components, and nu-merically optimizing the interaction matrix χ using boththe SDP and QP formulations. We then numericallycheck whether the optimal χ matrix in fact results inthe desired grand potential landscape as depicted in Fig-ure 1b. To this end, we minimize Ω( (cid:126)φ ) within the basin ofattraction of each target phase and verify that the localminimum is consistent with the target composition. Rep-resentative results of these calculations are shown in Fig-ure 2a, where by varying the number of encoded targets,we empirically determine the probability that a χ matrixexists for a random set of target phases. We ﬁrst observethat the number of targets that can be encoded is typ-ically much larger than N/M , the maximum number ofphases possible if every species is enriched in at most onetarget. Second, we ﬁnd evidence of the predicted thresh-olding transition: The probability of successful encodingdrops rapidly from near unity to near zero in a mannerthat depends on both N and M . We characterize this transition by n / , the number of targets for which theprobability of successful encoding is 50%. Rescaling n by n / reveals that the data collapse well across mixtureswith diﬀerent numbers of components (Figure 2a, inset ).We next test the graph-theoretical scaling predictionfor the thresholding transition. Plotting n / versus N ,we conﬁrm that the threshold indeed follows a power lawonce in the scaling regime, N (cid:29) M (Figure 2b). Wealso verify that the thresholding transition occurs at thesame number of targets regardless of whether the χ ma-trix is determined via SDP and QP optimization. Thisis the case due to our choice of ζ = 10 − ; if we choose ζ closer to unity, implying that depleted components arenot as strongly repelled from the target phases, then n / increases for the SDP solutions, although the powerlaw scaling remaining unchanged (data not shown). Fi-nally, we compare the empirical scaling exponents, de-termined from power law ﬁts to the thresholding data,to the graph-theoretical predictions (Figure 2c). In closeagreement with the predictions, we ﬁnd that the numberof targets that can be reliably encoded scales superlin-early with N whenever three or more components areenriched in each target. This is a second key conclu-sion: Because the number of phases that can be reliablyprogrammed may grow faster than the number of com-ponents, multicomponent mixtures can support dramat-ically more complex phase behavior than simple modelsystems that contain only a handful of components.Having analyzed the programmability of target phasesin the mean-ﬁeld model, we now ask whether these resultstranslate to a model of a ﬂuid with ﬁnite-range interac-tions. For this purpose, we study a three-dimensionallattice gas model with Hamiltonian H LG = (cid:88) (cid:104) r,s (cid:105) nbr (cid:88) i,j (cid:15) ij c ir c js − µ (cid:88) r (cid:88) i c ir , where r and s indicate lattice sites, (cid:15) is the interactionmatrix, c ir = 1 if lattice site r is occupied by a particle oftype i and is zero otherwise, and µ is the chemical poten-tial of each particle. The ﬁrst sum indicates that pairs ofparticles interact within a distance of l nbr on the lattice(Figure 3a, inset ). Because the analogues of constraints(2) and (3) are nonlinear in this model, we simply ap-ply the mean-ﬁeld QP solution, χ , and rescale it suchthat the net interaction energy within each target phase, (cid:15) target ≡ (cid:80) i,j (cid:15) ij ( x ( α ) i x ( α ) j − x (0) i x (0) j ), is below the criticalpoint. We then test whether the designed interactionmatrix (cid:15) codes for the self-assembly of the target phasesby performing two types of grand-canonical Monte Carlosimulations [19]. First, we perform “stability” simula-tions initialized from each of the target phases, as wellas from the metastable, homogeneous phase (Figure 3a).By computing the similarity between the target volumefractions and the volume fractions (cid:126)φ observed in the sim-ulation, cos θ ( α ) ≡ ( (cid:126)φ ( α ) · (cid:126)φ ) / | (cid:126)φ ( α ) || (cid:126)φ | , we can determinewhether the target phase is stable with respect to com-position ﬂuctuations. Second, we perform “growth” sim-ulations in a slab geometry, starting from a post-critical c o s θ α time / MC sweeps Stability simulation t =0 10100 20 40 60 80 100 n / N Mean-ﬁeldpredictionstability

Growth simulation stableunstable target-phaseconstituent othercomponent growth l nbr = 1 2(a) (c)(b) t i m e l nbr =1 l nbr =1 l nbr =2 l nbr =2 M = 5 FIG. 3. Lattice simulations with ﬁnite-range interactions fol-low the mean-ﬁeld predictions. (a) Bulk phases are consid-ered stable if the composition of the lattice ﬂuctuates nearthe target composition, such that cos θ ( α ) (cid:39) Inset : Il-lustration of ﬁrst- and second-nearest neighbor shells on thethree-dimensional lattice. (b) Growth from a metastable gasphase is considered successful if the composition of the dropletmatches that of the initial seed at time t = 0. Componentscolored red are enriched in the target phase, while blue com-ponents are depleted. (c) The values of n / obtained by sim-ulating stability and growth approach the mean-ﬁeld predic-tions when N (cid:29) M . However, short-range interactions (i.e., l nbr = 1) slightly reduce the number of targets than can bestably encoded and then assembled via droplet growth. nucleus of the target phase amidst a metastable gas ofthe homogeneous phase (Figure 3b). We classify such asimulation as successful if a phase with the target com-position grows to ﬁll the simulation box. As noted below,the outcome of these simulations depends on both µ and (cid:15) target .Our third key conclusion is that the scaling behaviorpredicted by the mean-ﬁeld model holds for both the sta-bility and growth simulations. A representative summaryof these calculations is shown in Figure 3c. Althoughthe empirical value of n / for each type of simulationis generally lower than that of the mean-ﬁeld model, weobserve that these values converge as the number of com-ponents increases. In the cases where χ solves the mean-ﬁeld inverse problem but the corresponding (cid:15) fails in thegrowth simulations, the cause is typically either a lowerfree-energy barrier between phases or the nucleation ofoﬀ-target phases at the droplet–gas phase interface in thelattice model. We note that increasing the interactionrange from l nbr = 1 to l nbr = 2, thereby making the lat-tice model more “mean-ﬁeld-like,” consistently increasesthe number of targets that can be reliably assembled.We conclude by demonstrating the possibility of encod-ing and self-assembling a set of compositionally distinctcondensates that outnumber the components in a mix-ture, as suggested by the superlinear scaling of n / in α β Ω / L k T ∆ φ αβ frequency α =1 10 20 30 Successes − z(cid:15) target /kT n = N = α =2 α =3 α =4 α =5 α =250 . . . Barrier heightsFree-energy landscapes Ω ‡ α β / L k T (a)(b) (c) (cid:51)(cid:55)(cid:51)(cid:51)(cid:51)(cid:51) l nb r = l nb r = l nbr =1 l nbr =1 , z(cid:15) target = − kT targethomogeneoushomogeneous phasetarget phases –targetpairs FIG. 4. Encoding 250 compositionally distinct target phases,each enriched in ﬁve species, using only 200 components.(a, left ) Projections of the grand potential on an L × L × L lattice with L = 6 and l nbr = 1. Each curve shows the freeenergy surface along a linear path in concentration space, pa-rameterized by ∆ φ αβ , between a pair of target phases (blue)or between the homogeneous phase and a target phase (red).(a, right ) The corresponding distribution of barrier heights—i.e., saddle points on the grand potential landscape—betweenpairs of phases, Ω ‡ αβ . (b) Snapshots from l nbr = 1 growthsimulations, seeded one-at-a-time by each of the 250 targetphases. Species belonging to the target phase are coloredred/yellow, while those depleted in the target phase are col-ored blue/cyan. (c) The number of successfully grown targetphases as a function of the mean interaction energy withineach target phase, (cid:15) target , and the interaction range l nbr ; z = 6is the lattice coordination number. Figure 2c. As an example, we apply the QP formulationin an attempt to encode 250 targets, each enriched inﬁve species, using only 200 components. We use Wang–Landau simulations [20, 21] to conﬁrm that these targetphases correspond to local minima of the grand poten-tial and are supersaturated relative to the homogeneousmixture in the l nbr = 1 lattice model (Figure 4a, left ).From these calculations, we are able to estimate the bar-rier heights, which are proportional to the macroscopicsurface tensions, between pairs of target phases and be-tween target phases and the homogeneous phase (Fig-ure 4a, right ). We ﬁnd that growth simulations occa-sionally fail to reproduce the target composition froma post-critical seed when l nbr = 1 (Figure 4b). However,the fraction of successfully assembled targets increaseswith both the magnitude of the interaction strength andthe number of neighbors with which particles interact, asboth of these strategies increase the barrier heights be-tween the target phases (Figure 4c); on a practical note,the latter requirement of a large eﬀective coordinationnumber is likely to be satisﬁed in condensed phases ofdisordered biopolymers [22]. Thus, by introducing oneor more seeds of each target phase into a single closedsystem, our simulations demonstrate the possibility of simultaneously self-assembling a greater number of con-densates than there are components using pairwise inter-actions alone. While this result might appear at oddswith the Gibbs Phase Rule, the paradox is resolved bynoting that condensate self-assembly can only be sus-tained by a ﬁnite thermodynamic driving force and isthus an inherently non-equilibrium process. Ultimately,some condensates must grow at the expense of othersas the system relaxes to equilibrium unless coarsening isarrested by another non-equilibrium mechanism [23].In summary, we have shown that the number of con-densates that can be encoded in a multicomponent mix-ture and reliably self-assembled can increase superlin-early with the number of species when condensates sharecomponents. The probability that an encoding exists fora random set of target compositions undergoes a thresh-olding transition, implying that the capacity of a mix-ture to self-assemble the target condensates can be pre- dicted by knowing only the number of components inthe mixture, the number of species enriched in each tar-get, and the number of targets. These predictions applyto any molecular system in which pairwise interactionsamong species can be programmed, either syntheticallyor through an evolutionary process. Furthermore, ourresults highlight the vast possibilities of programmableself-assembly in systems with thousands of components,as is the case in many biological mixtures, even in theabsence of orientationally speciﬁc saturating bonds [24].An intriguing possibility is that condensate assemblymight play a role in intracellular information processing,whereby the combinatorial complexity described here isexploited to create spontaneous spatial organization inresponse to stimuli, which take the form of heterogeneousnucleation sites. This model may thus have implicationsfor understanding spatiotemporal control of transcriptionvia condensate assembly in the nucleus.This work was carried out with ﬁnancial support fromPrinceton University and computational resources pro-vided by Princeton University Research Computing. [1] Y. Shin and C. P. Brangwynne, Liquid phase condensa-tion in cell physiology and disease, Science , eaaf4382(2017).[2] J. Berry, C. P. Brangwynne, and M. Haataja, Physi-cal principles of intracellular organization via active andpassive phase transitions, Rep. Prog. Phys. , 046601(2018).[3] C. P. Brangwynne, P. Tompa, and R. V. Pappu, Polymerphysics of intracellular phase transitions, Nat. Phys. ,899 (2015).[4] J.-M. Choi, A. S. Holehouse, and R. V. Pappu, Physicalprinciples underlying the complex biology of intracellularphase transitions, Ann. Rev. Biophys. , 107 (2020).[5] D. W. Sanders, N. Kedersha, D. S. Lee, A. R. Strom, et al. , Competing protein-RNA interaction networks con-trol multiphase intracellular organization, Cell , 306(2020).[6] W. Xing, D. Muhlrad, R. Parker, and M. K. Rosen, Aquantitative inventory of yeast P body proteins revealsprinciples of composition and speciﬁcity, Elife , e56525(2020).[7] S. F. Banani, H. O. Lee, A. A. Hyman, and M. K.Rosen, Biomolecular condensates: Organizers of cellularbiochemistry, Nat. Rev. Mol. Cell Bio. , 285 (2017).[8] P. A. Chong and J. D. Forman-Kay, Liquid–liquid phaseseparation in cellular signaling systems, Curr. Opin.Struct. Bio. , 180 (2016).[9] B. R. Sabari, A. Dall’Agnese, and R. A. Young, Biomolec-ular condensates in the nucleus, Trends Biochem. Sci. ,961 (2020).[10] R. P. Sear and J. A. Cuesta, Instabilities in complex mix-tures with a large number of components, Phys. Rev.Lett. , 245701 (2003).[11] W. M. Jacobs and D. Frenkel, Phase transitions in bio-logical systems with many components, Biophys. J. ,683 (2017).[12] M.-T. Wei, Y.-C. Chang, S. F. Shimobayashi, Y. Shin, et al. , Nucleated transcriptional condensates amplifygene expression, Nat. Cell Biol. , 1187 (2020).[13] M. Mir, M. R. Stadler, S. A. Ortiz, C. E. Hannon, et al. ,Dynamic multifactor hubs interact transiently with sitesof active transcription in drosophila embryos, Elife ,e40497 (2018).[14] B. O’Donoghue, E. Chu, N. Parikh, and S. Boyd, Conicoptimization via operator splitting and homogeneous self-dual embedding, J. Optimiz. Theory App. , 1042(2016).[15] S. Diamond and S. Boyd, CVXPY: A python-embeddedmodeling language for convex optimization, J. Mach.Learn. Res. , 2909 (2016).[16] E. P. Wigner, Random matrices in physics, SIAM Rev. , 1 (1967).[17] Z. F¨uredi and J. Koml´os, The eigenvalues of random sym-metric matrices, Combinatorica , 233 (1981).[18] A. Frieze and M. Karo´nski, Introduction to randomgraphs (Cambridge University Press, 2016).[19] D. Frenkel and B. Smit,

Understanding molecular simu-lation: From algorithms to applications (Elsevier, 2001).[20] F. Wang and D. P. Landau, Eﬃcient, multiple-range ran-dom walk algorithm to calculate the density of states,Phys. Rev. Lett. , 2050 (2001).[21] W. M. Jacobs and D. Frenkel, Predicting phase behaviorin multicomponent mixtures, J. Chem. Phys. , 024108(2013).[22] R. H. Colby and M. Rubinstein, Polymer physics, New-York: Oxford University , 274 (2003).[23] C. A. Weber, D. Zwicker, F. J¨ulicher, and C. F. Lee,Physics of active emulsions, Rep. Prog. Phys. , 064601(2019).[24] A. Murugan, Z. Zeravcic, M. P. Brenner, and S. Leibler,Multifarious assembly mixtures: Systems allowing re-trieval of diverse stored structures, P. Natl. Acad. Sci.U.S.A. , 54 (2015). SUPPLEMENTARY INFORMATIONAppendix A: Specifying the inverse problem

We deﬁne the target phase behavior by specifying the homogeneous phase volume fractions (cid:126)φ (0) , as well as thecomposition, x ( α ) i , for each enriched component i in each target phase α . The compositions of the depleted componentsin each target phase are not speciﬁed; however, we require that all the depleted components comprise a volume fractionwithin the target phase that is less than 1 /M α , where M α is the number of enriched components in the α phase.Therefore, we arrive at the ﬁrst constraint in the main text, x ( α ) i ≤ ζN M α if species i is depleted in phase α, (A1)where ζ < µ i = ∂F∂φ i = log φ i − log(1 − φ T ) − (cid:88) j χ ij φ j , (A2)and the dimensionless osmotic pressure,Π = (cid:88) i φ i µ i − F = − log(1 − φ T ) − (cid:88) i,j χ ij φ i φ j , (A3)be equal across all phases. We set µ ( α ) i = µ (0) i and Π ( α ) = Π (0) , and then eliminate the terms involving φ ( α )T from theequations for the chemical potentials. Approximating φ ( α )T ≈ (cid:88) i,j (cid:104) x ( α ) i x ( α ) j ( α ) i ( α ) j − δ ik ( x ( α ) j ( α ) j − φ (0) j ) − φ (0) i φ (0) j (cid:105) χ ij = − log x ( α ) k φ (0) k , (A4)where ( α ) i = 1 if species i is enriched in phase α and is zero otherwise; as a result of this approximation, the osmoticpressures of the various phases are not exactly equal, and we obtain a grand potential landscape in which the condensedphases are supersaturated relative to the homogeneous phase. We note that the supersaturation can be adjusted bytuning the volume fractions in the homogeneous phase, (cid:126)φ (0) .We obtain the third constraint in the main text by eliminating the term 1 / (1 − φ ( α )T ) from ∂ F/∂φ i ∂φ j | α by againusing the approximation Π ( α ) (cid:39) Π (0) : ∂ F∂φ i ∂φ j (cid:12)(cid:12)(cid:12)(cid:12) α = δ ij φ ( α ) i + w ( χ ) − χ ij (cid:31) . (A5)As noted in the main text, w ( χ ) has a negligible eﬀect on the minimum eigenvalue of ∂ F/∂φ i ∂φ j | α . We thereforeapproximate w ( χ ) by a constant expression when solving the optimization problem, w ( χ ) (cid:39) exp (cid:104) (cid:104) χ (cid:105) target − (cid:0) φ (0)T (cid:1) (cid:104) χ (cid:105) homo (cid:105) − φ (0)T , (A6)where, assuming the enriched components comprise equal volume fractions in all target phases, we have deﬁned theexpected averages of χ in the target, (cid:104) χ (cid:105) target ≡ [log( N/M φ (0)T ) + (2 φ (0)T − ( φ (0)T ) )( N − M ) log( ζ/N ) / N ] / (1 − φ (0)T ) ,and homogeneous phases, (cid:104) χ (cid:105) homo ≡ (cid:104) χ (cid:105) target + ( N − M ) log( ζ/N ) / N . With these approximations, the inverse prob-lem constraints (A1), (A4), and (A5) are linear in χ and only involve quantities speciﬁed in the problem deﬁnition,i.e., (cid:126)φ (0) and the compositions { x ( α ) i } of the enriched components. Appendix B: Solving the convex optimization problems

When performing numerical tests using the mean-ﬁeld model, we generate each target phase by randomly choosing M of the N species. We repeat this procedure to determine which species are enriched in each of the n target phases,and we verify that the targets generated in this way are in fact unique. The calculations presented in the main textassume an equimolar homogeneous phase with φ (0)T = 0 . χ matrix within the SDP and QP formulations, we seek to minimize the objectivefunctions L SDP ≡ − min α =0 ,...,n (cid:26) min i =1 ,...,N (cid:104) λ ( α ) i ( χ ) (cid:105)(cid:27) + κ Var i

B C (cid:21) , (B5)where the submatrix A represents the M × M block of enriched components in the α phase. The stability condition (A5)is equivalent to ( ∂ F ) /C ≡ A − B (cid:62) C − B (cid:31)

0. Furthermore, because the submatrix C pertains to the componentsthat are depleted in the α phase, the inverse of C has eigenvalues clustered near x ( α ) i φ ( α )T (cid:46) ζ/N M α . Finally, bytreating ( ∂ F ) /C as a symmetric random matrix, we can seek to maximize its lowest eigenvalue by minimizing thevariance of its oﬀ-diagonal elements [16, 17]. This random-matrix approximation is a reasonable assumption if thetarget-phase compositions are uncorrelated. The prefactors in (B2) are determined from the scaling behavior of thelowest eigenvalue, λ min , of ∂ F/∂φ i ∂φ j , assuming that this matrix can be treated as a symmetric random matrix [17]: λ (0)min (cid:39) Nφ (0)T − (cid:113) N Var i

4, in which case the χ matrix that minimizes L QP is a good approximation of the χ matrix that minimizes L SDP .In the results presented in the main text, the depleted-component partition-coeﬃcient constant ζ and the regular-ization constant κ are chosen to be 10 − and 10 − , respectively. Numerical optimization is performed using the SCSsolver [14] within the CVXPY framework [15]. Appendix C: Predicting the thresholding transition

We now elaborate on the random graph arguments discussed in the main text. The probability that a pair ofcomponents i and j are both enriched in any one of the target phases is p edge = 1 − (cid:20) − M ( M − N ( N − (cid:21) n (cid:39) nM N . (C1)The expectation value for the number of ( M + 1)-sized cliques that contain an M -sized target phase in the randomgraph is E ≤ n ( N − M ) p M edge . (C2)With high probability, a random graph does not contain such a clique if E <

1; thus, we obtain a threshold for n , n ∗ ∼ N (2 M − / ( M +1) , (C3)when N (cid:29) M . As discussed in the main text, we expect that n / will scale according to (C3) in the mean-ﬁeldmodel. We note that the threshold for the appearance of a random ( M + 1)-sized clique in an Erd˝os–R´enyi randomgraph G ( N, p edge ) with uncorrelated edges scales as [18] n ∗ K M +1 ∼ N M − /M . (C4)Our predicted scaling for n / grows more slowly with respect to N than n ∗ K M +1 when N (cid:29) M because (C2) accountsfor the fact that M -sized cliques exist in the graph by design. Thus, we expect to encounter ( M + 1)-sized cliquesthat directly destabilize target phases before we encounter completely random cliques of size M + 1 or greater.This expectation is borne out by the close agreement between our predicted scaling, (C3), and the empirical scalingexponents determined from power law ﬁts to the data in Figure 2b,c in the main text. Appendix D: Performing stability and growth simulations

We perform grand-canonical Monte Carlo (GCMC) simulations of stability and growth using a three-dimensionalcubic lattice model. As described in the main text, we obtain the interaction matrix, (cid:15) , from the mean-ﬁeld solutionby rescaling χ . We then set the chemical potential, µ , to be the same for all components; µ is chosen such thatthe target condensed phases are supersaturated relative to the homogeneous phase, and the chemical potential of avacancy is zero. Simulations are then carried out by proposing particle exchanges at every lattice site with equalprobability and accepting these moves according to the standard Metropolis criterion [19].When running stability simulations, we initialize a 10 × ×

10 cubic lattice with periodic boundary conditions ineach one of the target phases, including the homogeneous phase. This is accomplished by randomly assigning particlesto lattice sites such that the lattice composition is consistent with the target composition, (cid:126)x ( α ) . We then evolve thelattice via GCMC simulation, keeping track of the order parameter cos θ ( α ) . By construction, cos θ ( α ) = 1 at the startof each simulation. However, cos θ ( α ) is found to decrease markedly whenever the initial phase is unstable with respectto composition ﬂuctuations; we deﬁne cos θ ( α ) ≥ .

95 to be the threshold for a “successful” stability simulation. Byrunning these simulations for only 100 Monte Carlo sweeps, we ascertain whether the free-energy barriers surroundingeach one of the target phases are all higher than a few kT . We note that when target phases are found to be unstable,the simulation usually ends up in a condensed-phase free-energy basin that is enriched in more than M components.Such chimeric phases tend to have greater compositional entropy and are thus lower in free energy than any of the M -component target phases.When running growth simulations, we initialize a 100 × ×

10 periodic cubic lattice with a 10 × ×

10 seed ofthe target phase surrounded by the metastable, homogeneous gas phase. The initial conﬁgurations of both the seedand the gas phase are prepared in the same manner as the stability simulations described above. When suﬃcientlysupersaturated relative to the gas phase, the seed grows by advancing both interfaces with the gas phase at a constantvelocity parallel to the long axis of the simulation box. Simulations are halted once the condensed phase occupiesat least 90% of the simulation box. At this point, we classify a simulation as “successful” if the composition of thelattice occupied by the condensed phase matches the composition of the target phase using the same order parameterand threshold as above, i.e., if cos θ ( α ) ≥ . Appendix E: Calculating free-energy landscapes

We perform Wang–Landau simulations [20] to calculate the grand-potential free-energy surfaces between pairs oftarget phases in the lattice model, as shown in Figure 4a in the main text. Following the method introduced in [21],we deﬁne an order parameter ∆ φ αβ for a pair of phases α and β ,∆ φ αβ ( (cid:126)φ ) ≡ ( (cid:126)φ − (cid:126)ξ αβ ) · ˆ ν αβ , (E1)where (cid:126)ξ αβ ≡ ( (cid:126)φ ( α ) + (cid:126)φ ( β ) ) / ν αβ ≡ ( (cid:126)φ ( β ) − (cid:126)φ ( α ) ) / | (cid:126)φ ( β ) − (cid:126)φ ( α ) | . To constrain the simulation to explore the phasespace near phases α and β , we add a harmonic potential in directions of concentration space orthogonal to (cid:126)ξ αβ , U αβ ( (cid:126)φ ) ≡ k ⊥ (cid:12)(cid:12) ( (cid:126)φ − (cid:126)ξ αβ ) − [( (cid:126)φ − (cid:126)ξ αβ ) · ˆ ν αβ ]ˆ ν αβ (cid:12)(cid:12) . (E2)To improve the eﬃciency of the simulation, particle exchanges from a lattice site occupied by a particle (or a vacancy)of type i to a particle (or a vacancy) of type j are proposed with probability p gen ( i → j ) =  . j is a vacancy(0 . − . /M β if j is enriched in phase β . / ( N − M β ) if j is depleted in phase β (E3)if α is the homogeneous phase and β is a condensed target phase, or with probability p gen ( i → j ) = (cid:40) (1 − . /M αβ if j is enriched either in phase α or in phase β . / ( N + 1 − M αβ ) otherwise, including if j is a vacancy , (E4)where M αβ is the number of components that are enriched either in phase α or in phase β , if α and β are bothcondensed target phases. The Metropolis acceptance criterion is appropriately modiﬁed to account for these non-uniform generation probabilities.The free-energy landscapes presented in the main text are obtained using the combined potential, H LG + U αβ , onan L × L × L periodic lattice with L = 6 and k ⊥ = 10 . Figure 4a shows landscapes for 1,000 randomly selected pairsof target phases and for all 250 homogeneous-target phase pairs. The barrier heights, Ω ‡ , are computed by ﬁndingthe maximum on the computed free-energy surface between phases α and ββ