[PDF] Analytical Theory for Sequence-Specific Binary Fuzzy Complexes of Charged Intrinsically Disordered Proteins

Abstract

Intrinsically disordered proteins (IDPs) are important for biological functions. In contrast to folded proteins, molecular recognition among certain IDPs is "fuzzy" in that their binding and/or phase separation are stochastically governed by the interacting IDPs' amino acid sequences while their assembled conformations remain largely disordered. To help elucidate a basic aspect of this fascinating yet poorly understood phenomenon, the binding of a homo- or hetero-dimeric pair of polyampholytic IDPs is modeled statistical mechanically using cluster expansion. We find that the binding affinities of binary fuzzy complexes in the model correlate strongly with a newly derived simple "jSCD" parameter readily calculable from the pair of IDPs' sequence charge patterns. Predictions by our analytical theory are in essential agreement with coarse-grained explicit-chain simulations. This computationally efficient theoretical framework is expected to be broadly applicable to rationalizing and predicting sequence-specific IDP-IDP polyelectrostatic interactions.

Full PDF

AAnalytical Theory for Sequence-Speciﬁc BinaryFuzzy Complexes of Charged IntrinsicallyDisordered Proteins

Alan N. Amin, † , ∥ , § Yi-Hsuan Lin, † , ‡ , § Suman Das, † and Hue Sun Chan ∗ , † , ¶ † Department of Biochemistry, University of Toronto, Toronto, Ontario, Canada ‡ Molecular Medicine, Hospital for Sick Children, Toronto, Ontario, Canada ¶ Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada § Contributed equally to this work ∥ Present address: Systems, Synthetic, and Quantitative Biology Program,Harvard Medical School, Boston, Massachusetts, U.S.A.

E-mail: [email protected] a r X i v : . [ q - b i o . B M ] J u l bstract Intrinsically disordered proteins (IDPs) are important for biological functions. In con-trast to folded proteins, molecular recognition among certain IDPs is “fuzzy” in thattheir binding and/or phase separation are stochastically governed by the interact-ing IDPs’ amino acid sequences while their assembled conformations remain largelydisordered. To help elucidate a basic aspect of this fascinating yet poorly under-stood phenomenon, the binding of a homo- or hetero-dimeric pair of polyampholyticIDPs is modeled statistical mechanically using cluster expansion. We ﬁnd that thebinding aﬃnities of binary fuzzy complexes in the model correlate strongly with anewly derived simple “jSCD” parameter readily calculable from the pair of IDPs’ se-quence charge patterns. Predictions by our analytical theory are in essential agreementwith coarse-grained explicit-chain simulations. This computationally eﬃcient theoret-ical framework is expected to be broadly applicable to rationalizing and predictingsequence-speciﬁc IDP-IDP polyelectrostatic interactions.

Graphical TOC Entry ntroduction Intrinsically disordered proteins (IDPs)—hallmarked by their lack of folding to an essentiallyunique conformation in isolation—serve many physiological functions.

Compared to glob-ular proteins, IDPs are depleted in nonpolar but enriched in polar, aromatic, and chargedresidues. Some IDPs adopt ordered/folded conformations upon binding to folded targets or after posttranslational modiﬁcations, others remain disordered. Among the spectrum ofdiverse possible behaviors, the IDPs in certain IDP-folded protein complexes can be highlydisordered, as typiﬁed by the kinase inhibitor/ubiquitin ligase Sic1-Cdc4 complexes. Complexes with bound IDPs that are disordered are aptly named “fuzzy complexes”.

The role of these IDPs’ amino acid sequences in molecular recognition varies, depending onthe situation. For Sic1-Cdc4, most of the charges in the disordered Sic1 probably take partin modulating binding aﬃnity via multiple spatially long-range electrostatic—termed poly-electrostatic —interactions with the folded Cdc4 without locally engaging the Cdc4 bindingpockets. In contrast, for the IDP transactivation domain of Ewing sarcoma, sequence-dependent oncogenic eﬀects may be underpinned largely by multivalent, spatially short-range polycation- π interactions implicating the IDP’s tyrosine residues. More broadly,for multiple-component phase separation of IDPs, a “fuzzy” mode of molecular recognitionwas proposed whereby mixing/demixing of phase-separated polyampholyte species dependson quantiﬁable diﬀerences in the IDPs’ sequence charge patterns. Variations aside, thesemechanisms share the commonality of being stochastic in essence, involving highly dynamicconformations, and as such are distinct from those underlying the structurally speciﬁc andrelatively static binding participated by folded proteins. We thus extend the usage of “fuzzy”as an adjective not only for the structural features of certain biomolecular assemblies butalso for the molecular recognition mechanisms that contribute to the formation of fuzzyassemblies. This concept is applicable to nonbiological polymers as well. Whereas multiva-lency, stochasticity, and conformational diversity have long been the mainstay of polymerphysics, recently, sequence speciﬁcity and therefore fuzzy molecular recognition has become3ncreasingly important for nonbiological heteropolymers because of experimental advancesin “monomer precision” that allows for the synthesis of sequence-monodisperse polymers. As far as biomolecules are concerned, fuzzy molecular recognition should play a dom-inant role in “binding without folding” IDP complexes wherein the bound IDPs are dis-ordered.

Generally speaking, a condensed liquid droplet of IDP is a mesoscopic fuzzyassembly underpinned by a fuzzy molecular recognition mechanism. With regard to ba-sic binary (two-chain) IDP complexes, evidence has long pointed to their existence, although extra caution needed to be used to interpret the pertinent experimental data.

Of notable recent interest is the interaction between the strongly but oppositely charged H1and ProT α IDPs involved in chromatin condensation and remodeling, which remain disorderwhile forming a heterodimer with reported dissociation constant ranging from nanomolar to sub-micromolar levels. We now tackle a fundamental aspect of fuzzy molecular recognition, namely the impactof sequence-speciﬁc electrostatics on binary fuzzy complexes. Electrostatics is importantfor IDP interactions including phase separation.

IDP sequence speciﬁcity is akey feature of their single-chain properties and multiple-chain phase behaviors.

IDPproperties depend not only on their net charge but are also sensitive, to various degrees, totheir speciﬁc sequence charge pattern, which has been characterized by two parameters, κ and “sequence charge decoration” (SCD): κ is an intuitive blockiness measure; whereasSCD ({ σ }) = N N ∑ s,t = σ s σ t √∣ s − t ∣ , (1)deﬁned for any charge sequence { σ } = { σ , σ , . . . , σ N } , emerges from an analytical theoryfor polyampholyte dimensions. Both single-chain dimensions and phase separationpropensities are seen to correlate with these parameters. These measures are foundto be evolutionarily conserved among IDPs, suggesting intriguingly that the gestalt prop-erties they capture are functionally signiﬁcant. Our present focus is on binary complexes,4hich are of interest themselves and possibly also as proxies for mesoscopic multiple-IDPphase behaviors. Generalizing such a correspondence between single-chain properties andmultiple-chain phase behaviors for homopolymers and polyampholytes, for example,the osmotic second virial coeﬃcient, B , of a pair of IDP chains has been proposed as anapproximate measure for the IDP’s sequence-dependent phase separation propensity. Methods

With these thoughts in mind, we develop an analytical theory for binary IDP-IDP electro-statics. As exempliﬁed by recent studies of phase behaviors, approximate analyticaltheories, among complementary approaches, are conceptually productive and eﬃcient forgaining insights into sequence-speciﬁc IDP behaviors. The system analyzed herein con-sists of two IDPs A , B of lengths N A , N B ; charge sequences { σ A } = { σ A , σ A , . . . , σ AN A } , { σ B } = { σ B , σ B , . . . , σ BN B } ; and residue (monomer) coordinates { R A } = { R A , R A , . . . , R AN A } , { R B } = { R B , R B , . . . , R BN B } . Both A = B (homotypic) and A ≠ B (heterotypic) cases areconsidered. Key steps in the formal development are presented below; details are providedin the Supporting Information. The second virial coeﬃcient of the IDP pair is given by B = ∫ d R AB CM ⟨ − e − β U AB ( R AB CM ; { R A } , { R B }) ⟩ A,B , (2)where β = / k B T ( k B is Boltzmann constant, T is absolute temperature), R AB CM is the center-of-mass distance and U AB is the total interaction between A and B , and the average ⟨⋯⟩ A,B is over the conformational ensembles of A and B . To simplify notation, we use U AB todenote β U AB below. Now, Eq. 2 may be rewritten as B = V − Q AB Q A Q B = V ∫ D [ R A ] D [ R B ]P A [ R A ]P B [ R B ] ( − e − U AB [ R A , R B ] ) , (3)5here V is volume, Q AB is the partition function of the entire A - B system; Q A and Q B are,respectively, the isolated single-chain partition functions of A and B , D [ R i ] ≡ ∫ ∏ N i s = d R is with i = A, B , and P i [ R i ] is the single-chain probability density for conformation { R i } .Note that in the limiting case with no internal degrees of freedom in A and B , i.e., when N A = N B =

1, both Eqs. 2 and 3 reduce to B = ∫ d r { − exp [− U AB ( r )]} .When U AB is a sum of pairwise interactions between residues in diﬀerent polymers: U AB [ R A , R B ] = N A ∑ s = N B ∑ t = V ABst ( R ABst ) , (4)where R ABst ≡ R As − R Bt and V ABst is the ( s A , t B ) potential energy between the s -th residue in A and the t -th residue in B , the integrand in Eq. 3 may be expressed as a cluster expansion: e − U AB [ R A , R B ] − = { N A ∏ s = N B ∏ t = [( e −V ABst ( R ABst ) − ) + ]} − = N A ∑ s = N B ∑ t = f st + N A ∑ s ≥ t = N B ∑ l ≥ m = f sl f tm − N A ∑ s = N B ∑ t = f st + O ( f ) , (5)where f st ≡ exp [−V ABst ( R ABst )] − f -function for ( s A , t B ) . Intuitively, the ﬁrstand third terms of the last expression in Eq. S12 are functions of f st which involves onlyone residue per chain and thus is independent of the P i s for relative positions along thesame chain. In contrast, the second term of f st f lm involves two pairwise interchain interac-tions and thus P i -governed correlation of same-chain residue positions. Deﬁning the Fouriertransformed ( k -space) matrices of intrachain residue-residue correlation function [ ˆ P i ( k )] st ≡ ∫ D [ R i ]P i [ R i ] e i k ⋅( R is − R it ) , i = A, B, (6)and of the Mayer f -function [ ˆ f ( k )] st ≡ ∫ d r f st ( r ) e i k ⋅ r , (7)6he O ( f ) cluster expansion of B is derived in the Supporting Information as B = − N A ∑ s = N B ∑ t = [ ˆ f ( )] st − ∫ d k ( π ) Tr [ ˆ f ( k ) ˆ P B (− k ) ˆ f T (− k ) ˆ P A (− k ) − ˆ f ( k ) ˆ f T (− k )] + O ( f ) , (8)where the “T” superscript of a matrix denotes its transpose. Focusing on electrostatics, weﬁrst consider a screened Coulomb potential, V ijst ( r ) = l B σ is σ jt exp (− κ D r )/ r , which is equivalentto [ ˆ V( k )] st = πl B k + κ σ is σ jt (9)in k -space, where l B = e /( π(cid:15) (cid:15) r k B T ) is Bjerrum length, (cid:15) and (cid:15) r are vacuum and rela-tive permittivity, respectively, κ D is Debye screening wave number (not to be confused withthe sequence charge pattern parameter κ ). The case of pure Coulomb interaction (with-out screening) will be considered below. We then make two approximations in Eq. 8 fortractability. First, we approximate the IDP conformations as Gaussian chains with Kuhnlength b (Ref. 35), [ ˆ P i ( k )] st ≈ [ ˆ G M ( k )] st = e − ( kb ) ∣ s − t ∣ (10)where k ≡ ∣ k ∣ . Second, we express the Mayer f -functions as high-temperature expansions: [ ˆ f ( k )] st = − πl B k + κ σ is σ jt + πl k ( σ is σ jt ) tan − ( k κ D ) + O ( l ) . (11)With these two approximations, B up to O ( l ) is given by B ≈ πl B κ q A q B − l ∫ dkk ( k + κ ) N A ∑ s,t = N B ∑ l,m = σ As σ At σ Bl σ Bm e − ( kb ) [∣ s − t ∣+∣ l − m ∣] , (12)where q i ≡ ∑ N i s = σ i is the net charge of i . The two terms account, respectively, for the mean-ﬁeld Coulomb interaction between the two chains’ net charges and sequence speciﬁcity.7 esults and Discussion Dominant role of disorder in salt-dependent IDP binding.

Let θ bethe binding probability of chains A, B with the same concentration [ c ] . The probability thatthey are not bound 1 − θ ≡ V Q A Q B Q AB = − B / V (13)when V is chosen, without loss of generality, to include only an A, B pair and thus [ c ] = / V (cf. Eq. 3). It follows that the dissociation constant K D is given by1 K D = θ [ c ]( − θ ) [ c ] = − B ( − B / V ) ≈ − B , (14)where the last approximation holds at low A, B concentrations.To gain insight into the physical implications of the perturbative terms in the B ex-pression in Eq. 12, we ﬁrst apply them, through Eq. 14, to the binding of IDPs H1 andProT α for which salt-dependent K D s have recently been measured experimentally. H1and ProT α contain ≈

110 and ≈

200 residues, respectively, with small length variations fordiﬀerent constructs. We use the 202-residue H1 and 114-residue ProT α sequences in Table S1of the Supporting Information for theoretical calculations, assigning − + T = .

15 K, which is equal or similar to those used for various experiments. As aﬁrst approximation, we apply the standard relation κ D = ( πl B N A [ NaCl ]) / , where N A isAvogadro number and (cid:15) r =

78 for bulk water, to model dependence on NaCl concentration.It should be noted, however, that recent experiment showed that an “anomalous” decreasein κ D with increasing NaCl concentration likely ensues for [NaCl] ≳

500 mM. (Ref. 49).The theoretical salt-dependent K D s of H1 and ProT α thus calculated using Eqs. 12and 14 are shown in Fig. 1 toegther with single-molecule F¨orster resonance energy transfer(smFRET) and isothermal calorimetry (ITC) experimental data. All three set of datashow decrease in K D (increase in binding) with decreasing salt, but there is a large diﬀerence8etween the smFRET and ITC data. Notably, when [NaCl] is decreased from ≈

350 to 160mM, smFRET measured an ≈ × whereas ITC measured only an ≈

20 times increasein binding aﬃnity. This discrepancy remains to be resolved, as a careful examination ofthe experimental conditions is necessary, including the possible presence of not only binaryH1-ProT α complexes but also oligomers in the sample used in the experiments. Our theoretical K D s are within an order of magnitude of those measured by ITC. Theyare practically identical at 350 mM [NaCl], but our theoretical K D decreases only ≈ ≈

20 times for ITC. Our theory also predicts weakerProT α binding for the H1 C-terminal region than for full-length H1 (Fig. S1) as seen insmFRET experiment, but our predicted ∼ . K D is less than the ≈

20 timesmeasured by smFRET experiment. In general, our cluster expansion (Eq. S12), which is ahigh- T expansion, is less accurate when electrostatic interaction is strong, such as at zeroor low salt, because B in Eq. 12 includes only two terms in a perturbation series, neglectingattractive terms of order l and higher. This consideration oﬀers a perspective to understandthe modest diﬀerence between our theory and ITC measurement at low salt. However,although the partial agreement between theory and ITC is tantalizing, our current theoryshould be most useful for conceptual and semi-quantitative investigation of comparativesequence dependence of diﬀerent IDP complexes rather than as a quantitative predictorfor the absolute binding aﬃnity of a particular pair of IDPs. Our theory ignores manystructural and energetic details for tractability, including ion condensation, the eﬀect ofwhich has a salt dependence that might underlie the dramatic salt dependence of K D asseen by smFRET, and other solvation eﬀects that might necessitate an eﬀective separation-dependent dielectric. After all, explicit-chain simulation has produced a K D ≈ × − µ Mwhich is >

300 times more favorable than that measured by smFRET, underscoring that,as it stands, all reported H1-ProT α experimental data are within theoretical possibilities.Limitations of our analytical formulation notwithstanding, an important physical insightis gained by inspecting the contributions in Eq. 12 to the predicted H1-ProT α behavior from9he ﬁrst mean-ﬁeld term that depends solely on overall net charges of the two IDPs and thesecond, sequence-speciﬁc term. Remarkably, the mean-ﬁeld net-charge term alone yields K D sthat are 30–40 times larger than those calculated using both terms in Eq. 12 (Table S1), indi-cating that the net-charge term is almost inconsequential and that the sequence-dependentterm—and by extension also the O ( l ) terms—embodying the dynamic disorder of IDPconformations play a dominant role in the favorable assembly of fuzzy IDP complexes.

175 200 225 250 275 300 325 35010 TheoryITCsmFRET [NaCl] (mM) K D ( μ M ) Figure 1: Theoretical and experimental H1-ProT α dissociation constants as functions of saltconcentration. Data plotted are provided in Table S1 in the Supporting Information. Assembly of binary fuzzy complex is highly sequence speciﬁc.

Wenow proceed to compare the binding of diﬀerent IDP pairs and analyze them systematicallyby expressing the B for electrostatic interactions in Eq. 12 as B ≈ πl B κ q A q B − l ∫ dkk ( k + κ ) N A ∑ s,t = N B ∑ l,m = σ As σ At σ Bl σ Bm e − ( kb ) [∣ s − t ∣+∣ l − m ∣] ≡ F + F , (15)where F is an O ( l B ) term arising from the interaction between the net charges q A of chain A and q B of chain B , and F accounts for sequence speciﬁcity. We further rewrite F as F = − l ∫ ∞ dkk ( k + κ ) N A ∑ s,t = N B ∑ l,m = σ As σ At σ Bl σ Bm e − ( kb ) (∣ s − t ∣+∣ l − m ∣) ≡ − l b √ N A ∑ s,t = N B ∑ l,m = σ As σ At σ Bl σ Bm I (∣ s − t ∣+∣ l − m ∣) , (16)10here I is the following integral over the variable ¯ k : I X = ∫ ∞ d ¯ k ¯ k ( ¯ k + ¯ κ D2 ) e − X ¯ k (17)with ¯ k ≡ kb /√

6, ¯ κ D ≡ κ D b /√

6, and X ≡ ∣ s − t ∣ + ∣ l − m ∣ . Using integration by parts, I X = − ∫ ∞ d ¯ k ( ¯ ke − X ¯ k ) dd ¯ k k + ¯ κ D2 = −

12 ¯ ke − X ¯ k ¯ k + ¯ κ D2 ∣ ∞ + ∫ ∞ d ¯ k − X ¯ k ¯ k + ¯ κ D2 e − X ¯ k = ( + X ¯ κ D2 ) ∫ ∞ d ¯ k e − X ¯ k ¯ k + ¯ κ D2 − X ∫ ∞ d ¯ ke − X ¯ k = ( π κ D + πX ¯ κ D ) e X ¯ κ D2 erfc ( ¯ κ D √ X ) − √ πX , (18)where erfc ( z ) = ( /√ π ) ∫ ∞ z dt exp (− t ) is the complementary error function. In a ¯ κ D ≪ ( z ) and e z by their Taylor series, e z erfc ( z ) = − z √ π + z + O ( z ) , (19)setting z = √ X ¯ κ D and applying Eq. 19 to the last expression in Eq. 18 yields I X = π κ D − √ πX + π ¯ κ D X + O ( ¯ κ D2 ) . (20)In that case F in Eq. 15 becomes F = − l b √ ⎡⎢⎢⎢⎣ π √ κ D b ( q A ) ( q B ) − √ π N A ∑ s,t = N B ∑ l,m = σ As σ At σ Bl σ Bm √∣ s − t ∣ + ∣ l − m ∣⎤⎥⎥⎥⎦ + O ( ¯ κ D ) , (21)where the ﬁrst term is an O ( l ) contribution due to the chains’ net charges q A and q B ,the second term involving individual σ is s then provides the lowest-order (in ¯ κ D ) account ofsequence speciﬁcity. A two-chain sequence charge pattern parameter, which we refer to as“joint sequence charge decoration” (jSCD) because of its formal similarity with the single-11hain SCD (Ref. 29), emerges naturally from this sequence-speciﬁc term in Eq. 21:jSCD ( σ A , σ B ) ≡ − N A N B N A ∑ s,t = N B ∑ l,m = σ As σ At σ Bl σ Bm √∣ s − t ∣ + ∣ l − m ∣ . (22)When one or both of the chains are overall neutral, i.e., q A = q B = q A q B = F and the ﬁrst term of F in Eq. 21 vanish, leaving B in a form that is is proportionalto jSCD: B ∣ κ D → ,q A q B = = − √ π l bN A N B × jSCD ( σ A , σ B ) . (23)When both chains are not overall neutral, i.e., q A ≠ q B ≠ q A q B ≠ q A q B termsin Eqs. 15 and 21 are part of the Taylor series of the Mayer f -function of the mean-ﬁeld(MF) net charge interaction, as can be seen from the identity of these terms with the ﬁrsttwo terms in the Taylor expansion of the second virial coeﬃcient (denoted B MF2 here) of twopoint charges interacting via a screened Coulomb potential: B MF2 = ∫ d r ( − e − l B q A q B e − κ D r / r )= π ∫ ∞ drr ( l B q A q B e − κ D r r − l ( q A ) ( q B ) e − κ D r r + O ( l ))= πl B q A q B κ − πl κ D ( q A ) ( q B ) + O ( l ) . (24)Since these q A q B terms in Eqs. 15 and 21 do not involve individual σ is s and thus include nosequence speciﬁcity, the jSCD term is always the lowest-order term (in ¯ κ D ) that takes intoaccount sequence speciﬁcity for overall neutral as well as overall non-neutral chains. We alsonote that the divergence of these net charge terms in the κ D → κ D > + ) and 25negative ( − ) charges but they have diﬀerent charge patterns as quantitied by κ and SCD K − ) ranging widely from under 5 µ M to over 2 mMare predicted by Eq. 23 for the 900 sv sequence pairs, exhibiting a general trend of increasingbinding aﬃnity with increasing charge segregation of the interacting IDPs as measured bySCD (Fig. 2b) and κ (Fig. 3). E K E K E K E K E K E K E K E K E K E K E K E K E K E K E K E K E K E K E K E K E K E K E K E K E K

K E K E E K E K K K E E E E K E K K K K E E K E K E K E K E E K K E E K K K K E E K E E K E K E K E

K E K E K K E E K E K K E E E K K E K E K E K K K E E K K K E E K E E K K E E K K K E E K E E E K E

E E E K K E K K E E K E E K K E K K E K E E E K K K E K E E K K E E E K K K E K E E E E K K K K E K

E E E K K K E E E K K K E E E K K K E E E K K K E E E K K K E E E K K K E E E K K K E E E K K K E K

E K E K K K K K E E E K K E K E E E E K E E E E K K K K K E K E E E K E E K K E E K E K K K E E K K

E E E E K K K K E E E E K K K K E E E E K K K K E E E E K K K K E E E E K K K K E E E E K K K K E K

K K K K E E E E K K K K E E E E K K K K E E E E K K K K E E E E K K K K E E E E K K K K E E E E K E

K E K K K E K K E E K K E E K E K E K E K E E K K K E E K E K E K E K K K E E K E K E E K K E E E E

10 sv16

E K E K E E K K K E E K K K K E K K E K E E K K E K E K E K K E E E E E E E E E K E K K E K K K K E

11 sv18

K E E K K E E E E E E E K E E K K K K K E K K K E K K E E E K K K E E K K K E E E E E E K K K K E K

12 sv19

E E E E E K K K K K E E E E E K K K K K E E E E E K K K K K E E E E E K K K K K E E E E E K K K K K

13 sv9

E E K K E E E K E K E K E E E E E K K E K K E K K E K K K E E K E K E K K K E K K K K E K E E E K E

14 sv10

E K K K K K K E E K K K E E E E E K K K E E E K K K E K K E E K E K E E K E K K E K K E E K E E E E

15 sv14

E K K E K E E K E E E E K K K K K E E K E K K E K K K K E K K K K K E E E E E E K E E K E K E K E E

16 sv13

K E K K K E K E K K E K K K E E E K K K E E E K E K K K E E K K E K K E K K E E E E E E E K E E K E

17 sv12

E K K E E E E E E K E K K E E E E K E K E K K E K E E K E K K E K K K E K K E E E K E K K K K E K K

18 sv21

E E E E E E E E E K E K K K K K E K E E K K K K K K E K K E K K K K E K K E E E E E E K E E E K K K

19 sv15

K K E K K E K K K E K K E K K E E E K E K E K K E K K K K E K E K K E E E E E E E E K E E K K E E E

20 sv22

K E E E E K E E K E E K K K K E K E E K E K K K K K K K K K K K K E K K E E E E E E E E K E K E E E

21 sv17

E K E K K K K K K E K E K K K K E K E K K E K K E K E E E K E E K E K E K K E E K K E E E E E E E E

22 sv20

E E K E E E E E E K E E E K E E K K E E E K E K K E K K E K E E K K E K K K K K K K K K K K K E E E

23 sv27

K K E K K K E K K E E E E E E E E E E E E E E E E E E E E K E E K K K K K K K K K K K K K K K E K K

24 sv23

E E E E E K E E E E E E E E E E E K E E K E K K K K K K E K K K K K K K E K E K K K K E K K E E K K

25 sv25

E E E E E E E E E E E K E E E E K E E K E E K E K K K K K K K K K K K K K K K K K K E E K K E E K E

26 sv28

E K K K K K K K K K K K K K K K K K K K K K E E E E E E E E E E E E E E E E E E K K E E E E E K E K

27 sv26

K E E E E E E E K E E K E E E E E E E E E K E E E E K E E K K K K K K K K K K K K K K K K K K K K E

28 sv24

E E E E K E E E E E K E E E E E E E E E E E E K K K E E K K K K K E K K K K K K K E K K K K K K K K

29 sv29

K E E E E K E E E E E E E E E E E E E E E E E E E E E K K K K K K K K K K K K K K K K K K K K K K K

30 sv30

E E E E E E E E E E E E E E E E E E E E E E E E E K K K K K K K K K K K K K K K K K K K K K K K K K (a) (b)-SCD

11 1 -SCD ranking κ ranking K D-1 (M -1 ) Figure 2: Fuzzy complex binding depends strongly on sequence charge patterns. (a) The 30sv sequences (red: −

1, blue: +

1) ordered by their SCD values (left) whereas the number afterthe “sv” (right) indicates their ranking by κ (Ref. 28). (b) Heatmap of binding aﬃnities ofall 30 ×

30 sv pairs. Sequences with higher − SCD values bind more tightly. κ K D-1 (M -1 ) Figure 3: Heatmap of binding aﬃnities of all 30 ×

30 pairs of overall charge neutral svsequences arranged in increasing value of the κ parameter of Das and Pappu along bothaxes. Consistent with the trend shown in Fig. 2b for SCD dependence, sequences with higher κ values here are seen to have generally higher binding aﬃnities. New analytical relationship with phase separation. jSCD characterizesnot only binary fuzzy IDP complexes but also IDP phase separation. In the random phase13pproximation (RPA) theory of phase separation of overall charge neutral sequences inthe absence of salt and short-range cutoﬀ of Coulomb interaction (Eqs. 39 and 40 of Ref. 35with ˜ k [ + ˜ k ] → ˜ k ), the electrostatic free energy f el may be expanded through O ( l ) as f el = ∫ ∞ dkk π { ln [ + πφ m k T ∗ N ⟨ σ ∣ ˆ G M ( k )∣ σ ⟩] − πφ m k T ∗ N ⟨ σ ∣ ˆ G M ( k )∣ σ ⟩}= − φ m T ∗ N ∫ ∞ dkk ⟨ σ ∣ ˆ G M ( k )∣ σ ⟩ + O ( l )= − φ m T ∗ √ π ( σ, σ ) + O ( l ) , (25)where N is chain length and φ m is volume fraction of the IDP, T ∗ ≡ b / l B is reduced tempera-ture, and ⟨ σ ∣ ˆ G M ( k )∣ σ ⟩ = ∑ Ns,t = σ s σ t exp (− k ∣ s − t ∣/ ) is the charge structure factor ( ∑ Ns = σ s = φ m term in Eq. 25 allows for an approximate sequence-dependentFlory-Huggin (FH) theory of phase separation, which we term jSCD-FH, with an eﬀectiveFH χ parameter χ ( σ, σ ) ≡ √ π ( σ, σ ) T ∗ . (26)For two IDP species A, B , one similarly obtains χ ( σ A , σ B ) = √ π / [ jSCD ( σ A , σ B )]/ T ∗ and f el = − χ ( σ A , σ A ) φ A − χ ( σ A , σ B ) φ A φ B − χ ( σ B , σ B ) φ B + O ( l ) (27)in the form of the FH interaction terms for a-two-IDP species system (Eq. 27 of Ref. 16).Recognizing χ = χ cr = (√ N + ) /( N ) at the FH critical temperature T ∗ cr , Eq. 26 suggeststhat for N = T ∗ cr ( σ ) ≈ . × jSCD / ( σ, σ ) . (28)A strong correlation between jSCD and the product of its two component SCDs is suggestedby Fig. 2b. Indeed, for the 30 sv sequences as well as 1,000 randomly generated overallcharge neutral 50mer sequences (see Supporting Information for description), jSCD ( σ, σ ) ≈ . × ∣ SCD ( σ )∣ . and jSCD ( σ A , σ B ) ≈ . [ SCD ( σ A ) × SCD ( σ B )] . (Fig. 4a,b). The14orrelations are excellent aside from slightly more scatter around SCD ∼

1. To assessthe robustness of these correlations, we consider also a modiﬁed Coulomb potential l B [ − exp (− r / b )]/ r with short-range cutoﬀ used in RPA to derive a modiﬁed jSCD,jSCD cutoﬀ ( σ A , σ B ) ≡ N A N B √ π ∫ ∞ dkk ( + k ) N A ∑ s,t = N B ∑ l,m = σ As σ At σ Bl σ Bm e − ( kb ) [∣ s − t ∣+∣ l − m ∣] , (29)and ﬁnd that jSCD cutoﬀ ( σ, σ ) ≈ . ×∣ SCD ( σ )∣ . and jSCD cutoﬀ ( σ A , σ B ) ≈ . [ SCD ( σ A )× SCD ( σ B )] . (Fig. 4c,d). Interestingly, combining the jSCD cutoﬀ ( σ, σ ) scaling and Eq. 28rationalizes the T ∗ cr ˜ ∝ SCD scaling in Ref. 30 (Fig. 5); and this analytical result is in linewith the relation between B and T ∗ cr deduced from explicit-chain simulations. Takinginto account also the jSCD cutoﬀ ( σ A , σ B ) scaling and Eq. 27 rationalizes the χ ( σ A , σ B ) =√ χ ( σ A , σ A ) χ ( σ B , σ B ) relation in Ref. 16 (Fig. 6). Not unexpectedly, in both cases, ap-proximate mean-ﬁeld jSCD-FH produces a trend consistent with RPA, but entails a sharperdependence of phase behaviors on SCD than that predicted by RPA (Figs. 5 and 6). Inthis connection, it is instructive to note that the general trend of sequence dependent criti-cal temperature of polyampholyte phase separation has recently been shown to agree largelywith that obtained from ﬁeld-theoretic simulations, despite the RPA’s expected limitationsin accounting for polyampholyte phase behaviors at very low concentrations.Previously, the tendency of the populations of two polyampholytes A and B to demixupon phase separation (as quantiﬁed, e.g., by an A αβ parameter) was reported to correlatewith their SCD diﬀerence SCD ( σ B ) − SCD ( σ A ) (Ref. 16 and Fig. 6d). In view of the abovetheoretical development and the fact that A αβ ∼ SCD ( σ B )− SCD ( σ A ) was observed only for aset of six sv pairs ( A and B ) all having sv28 as sequence A , this previously observed empiricalcorrelation should now be viewed as a special case of an expected general correlation betweenjSCD ( σ A , σ B ) and the tendency for demixing of sequences A and B upon phase separationbecause in the special case when SCD ( σ A ) = constant, jSCD ( σ A , σ B ) ˜ ∝ SCD ( σ A )× SCD ( σ B ) ˜ ∝ [ SCD ( σ B ) − SCD ( σ A )] + constant. 15 SCD( σ A ) SCD( σ A ) (a) (b) (c) (d) SCD( σ A )*SCD( σ B )SCD( σ A )*SCD( σ B ) j S C D ( σ A , σ A ) j S C D ( σ A , σ A ) j S C D ( σ A , σ B ) j S C D ( σ A , σ B ) Figure 4: Correlation between single- and double-chain sequence charge pattern parameters.jSCD vs SCD scatter plots for homotypic (a, c) or heterotypic (b, d) pairs among the 30 × r = (a) 0.983, (b) 0.967, (c) 0.997, and (d) 0.994. The correlation is good for both jSCDand jSCD cutoﬀ but their ﬁtted scaling exponents are not identical. Apparently, SCD < -SCD RPA= -0.314(SCD)jSCD0.118 (SCD)0 5 10 15 20 2505101520

Figure 5: Approximate mean-ﬁeld jSCD-FH phase separation theories entail stronger de-pendence of critical temperature T ∗ cr on SCD than that predicted by RPA theory. Resultsshown are for the 30 sv sequences of Das and Pappu. Critical temperatures calculatedusing RPA (green symbols) and its linear ﬁt T ∗ cr = − . × SCD (blue line) are taken fromFig. 3b of Ref. 30. T ∗ cr values computed here based on the jSCD-FH result in Eq. 28 and thejSCD cutoﬀ expression in Eq. 29, i.e., T ∗ cr = . √ jSCD cutoﬀ , are plotted in orange. The linearﬁt to the data points is provided in the same color. Slightly diﬀerent jSCD-FH T ∗ cr values areobtained using the formula T ∗ cr = − . √ . × SCD solely by replacing the actual jSCD cutoﬀ values with the ﬁtted value jSCD cutoﬀ = . × ( SCD ) deduced from Fig. 4c. Data in thisplot indicate that both of the two jSCD-FH formulations capture the T ∗ cr ˜ ∝ SCD relation very well but overestimate the phase separation propensities relative to the RPA-predictedpropensities for all 30 sv sequences. 16 SCD B -SCD A (a) (b)(c) (d) RPAjSCD-FH Figure 6: Binary phase diagrams generated by the approximate jSCD-based eﬀective Flory-Huggins (jSCD-FH) interaction free energy given by Eq. 27 with the χ parameters given byEq. 26 with T ∗ =

10. Sequence A is sv28 and sequence B s are (a) sv24, (b) sv25, and (c)sv20. φ s are volume fractions of the polyampholytes. Blue dots are numerically solved phase-separated states α ≡ ( φ αA , φ αB ) and β ≡ ( φ βA , φ βB ) [the β here for labeling phase-separated statesis not to be confused with the reciprocal Boltzmann factor 1 / k B T ]; black dashed lines are tielines connecting an α – β pair of coexisting states. Consistent with the RPA phase diagramsprovided in Fig. 3 of Ref. 16, panels (a)–(c) here of jSCD-FH results show the same generaltrend that sequences with similar SCDs coalesce whereas those with signiﬁcantly diﬀerentSCDs exclude each other; but the degree of exclusion predicted by the present jSCD-FHtheory is signiﬁcantly higher than that predicted previously by RPA theory. (d) Variationof the composition asymmetry measure, A αβ , which is a demixing parameter (vertical axis),with the diﬀerence in SCD values of the sequence pair (horizontal axis; SCD A = SCD ( σ A ) ,SCD B = SCD ( σ B ) ). The measure A αβ ≡ ( / π )⟨∣ tan − ( φ αA / φ αB ) − tan − ( φ βA / φ βB )∣⟩ , where the ⟨⋯⟩ average is over all tie-line connected α – β pairs, is deﬁned in Eq. 26 of Ref. 16 to quantifythe tendency of two sequences A and B in a solution system to demix upon separation intotwo phases α and β . The orange jSCD-FH data points here are seen to be always higher thanthe corresponding green RPA data points, indicating that the more approximate mean-ﬁeldjSCD-FH formulation always overestimates demixing propensity. Lines joining data pointsare guides for the eye. The last three jSCD-FH data points are connected by dashed linesinstead of solid lines to underscore the fact that A αβ is already saturated at the third (sv28–sv20) sequence pairs shown and the remaining A αβ data points for larger SCD B − SCD A diﬀerences remain at the same saturated value. Theory-predicted trend is consistent with simulations and Kuhnlength renormalization . We now assess our approximate theory by comparing its17redictions with coarse-grained molecular dynamics simulations of six sv sequence pairs.Details of the explicit-chain model is in the Supporting Information. Because bound IDPsin a fuzzy complex are dynamic, their conﬁgurations are diverse. The IDP chains in somebound conﬁgurations are relatively open, some are highly intertwined, others can take theform of two relatively compact chains interacting favorably mostly via residues situated onthe surface of their individually compact conformations (Fig. 7a). Taking into account thisdiversity, we sample all intermolecular residue-residue distances between the model IDPs(rather than merely their center-of-mass distances) and use the appearance of a bimodaldistribution to deﬁne binding (Fig. 7b) with binding probability θ given by the fractionalarea covered by the small-distance peak. To better quantify the role of favorable interchaininteraction—rather than random collision—in the formation of IDP complexes, we subtracta reference probability, 4 πr /( V ) , that two particles in a simulation box of size V willbe within the cutoﬀ distance r cut that deﬁnes the the small-distance peak in Fig. 7b; andcompare ˜ θ ≡ θ − πr /( V ) with theoretical predictions.For the sequence pairs considered, theoretical K − is generally substantially higher thansimulated K − at the same temperature. The mismatch likely arises from diﬀerences inthe two models; for example, excluded volume is considered in the simulation but not inthe present analytical theory. For the same reason, a similar mismatch between theory andexplicit-chain simulation has been noted in the study of phase separation of sv aequences. Nonetheless, sequence-dependent trends of binding predicted by theory and simulation arelargely similar (Fig. 7c). Notably, both theory and simulation posit that sv24–sv28 bindsmore strongly than sv25–sv28, exhibiting a rank order that is consistent with SCD (sv24 hasa larger − SCD value than sv25) but not κ (sv24 has a smaller κ parameter than sv25).However, theory and simulation disagree on the rank order of sv15–sv28 and sv20–sv28binding aﬃnities (Fig. 7c). As a ﬁrst step in addressing this discrepancy, we examine moreclosely the impact of using a Gaussian-chain assumption to derive the B formula in Eq. 12.The Gaussian-chain approximation in the general formula for B in Eq. 8 is for tractabil-18igure 7: Comparing analytical theory against explicit-chain simulation. Results from sim-ulations are for T ∗ = .

35. (a) Snapshots of fuzzy complexes of sv28 (cyan/orange: +/− )with diﬀerent partners (blue/red: +/− ): sv24 (surface touch), s25 (entangled) and sv1 (ex-tended). (b) Distribution (histogram) of sv24–sv28 interchain residue-residue distance among1,000,000 snapshots. The small- r peak region (marked by the green frame) is the bound state.The black curve is the baseline distribution of distance between a pair of non-interacting par-ticles in the same simulation box. (c) Theoretical K D (blue) vs simulated ˜ θ (red), of sv28with various partners (horizontal axis), where ˜ θ ≡ θ − θ with θ = π × /( × ) beingthe baseline probability that two non-interacting particles is < a apart. The ˜ θ < and that of Shen and Wang for polyelectrolytes. Here we focus on the method in Ref. 29, which entails deriving sequence-dependent eﬀective,or renormalized, Kuhn lengths, denoted as x ist b for residue pair s, t in chain i , to replacethe “bare” Kuhn length b in the original simple Gaussian formulation. In other words, themodiﬁcation [ ˆ P i ( k )] st ≈ exp [− ( kb ) ∣ s − t ∣] → exp [− ( kb ) x ist ∣ s − t ∣] (30)is applied to Eq. 10. In this approach, instead of assuming that the conformational dis-tribution of each of the two IDP chains in our binary interacting IDP-IDP system is thatof a simple Gaussian chain as if it experiences no interaction other than the contraints ofchain connectivity, the impact of the intrachain part of the interaction in the system on theconformational distribution of an individual IDP chain is taken into account approximately19y treating the IDP as a modiﬁed Gaussian chain with a renormalized Kuhn length. Assuch, it should be noted that this renormalization procedure is performed on a single isolatedchain without addressing eﬀects of interchain interactions.Recognizing that the simple Gaussian-chain correlation function in Eq. 10 is a conse-quence of a single-chain Hamiltonian H i [ R i ] containing only terms for elastic chain connec-tivity, viz., H i [ R i ] = b N i − ∑ s = ∣ R is + − R is ∣ , (31)we now also take into consideration an intrachain interaction potential U i [ R i ] that includeselectrostatic interaction and excluded-volume repulsion, U i [ R i ] = N i ∑ s > t = [ l B σ is σ it e − κ D ∣ R is − R it ∣ ∣ R is − R it ∣ + w i δ ( R is − R it )] , (32)where w i is the two-body excluded-volume repulsion strength for chain i . For the 30 svsequences used in the present analysis, the w i values obtained from matching theory withresult from explicit-chain atomic simulation conducted in the “intrinsic solvation” limit inthe absence of electrostatic interactions are available from Table 1 of Sawle and Ghosh. A full Hamiltonian H i is then given by the sum of Eqs. 31 and 32: H i [ R i ] = H i [ R i ] + U i [ R i ] . (33)We assume, as in Ref. 29, that the full Hamiltonian can be approximated as the Hamiltonian T ix [ R i ] for a modiﬁed Gaussian chain with an eﬀective Kuhn length x i b , which is equivalentto N i b → N i b ( x i b ) while holding the total contour length N i b unchanged (cf. Eqs. 1 and 2of Ref. 29). In other words, H i [ R i ] ≈ T ix [ R i ] ≡ xb N i − ∑ s = ∣ R is + − R is ∣ , (34)where x is to be determined by the variational approach described in Ref. 29. Here we20rieﬂy summarize the concept and result, and refer the readers to the original paper formethodological details. The approach consists of expressing the full Hamiltonian as a sumof the “principal” T ix component and a “perturbative” ∆ H ix term: H i [ R i ] = T ix [ R i ] + ∆ H ix [ R i ] , (35)where, by Eqs. 31, 32 and 34,∆ H ix [ R i ] = b ( − x ) N i − ∑ s = ∣ R is + − R is ∣ + N i ∑ s > t = [ l B σ is σ it e − κ D ∣ R is − R it ∣ ∣ R is − R it ∣ + w i δ ( R is − R it )] . (36)Making use of the form in Eq. 34, the full thermodynamic average ⟨ A ⟩ —Boltzmann-weightedby the full Hamiltonian H i —of any physical observable A can be cast as an expansion in thepower of the perturbative Hamiltonian ∆ H ix (Eq. 3 of Ref. 29): ⟨ A ⟩ = ⟨ A ⟩ x + ⟨ A ⟩ x ⟨ ∆ H ix ⟩ x − ⟨ A ∆ H ix ⟩ x + O [( ∆ H ix ) ] , (37)where the averages ⟨⋯⟩ and ⟨⋯⟩ x are deﬁned by ⟨ A ⟩ ≡ ∫ D [ R i ] A [ R i ] e −H i [ R i ] ∫ D [ R i ] e −H i [ R i ] , (38a) ⟨ A ⟩ x ≡ ∫ D [ R i ] A [ R i ] e −T ix [ R i ] ∫ D [ R i ] e −T ix [ R i ] . (38b)For any observable A of interest, an optimal x in this formalism is obtained by minimizingthe diﬀerence between the averages weighted by the full H i and the approximate T ix througheliminating the O ( ∆ H ix ) term in Eq. 37. Imposing this condition allows us to solve foran optimal set of x ist for a given A . Comparisons by Ghosh and coworkers of results fromthis theoretical approach against those from explicit-chain simulations have demonstratedthat this is a rather accurate and eﬀective method. Ideally, the correlation functions [ ˆ P i ( k )] st themselves should be used as observables for the optimization; but that leads to21nsurmountable technical diﬃculties. Thus, following Ref. 29 (Eq. 11 of this reference), weuse ∣ R is − R it ∣ as observables to optimize x ist s. Accordingly, for each residue pair s i , t i onchain i , an optimized x factor, x ist , is obtained by solving the equation ⟨∣ R is − R it ∣ ⟩ x ist ⟨ ∆ H ix ist ⟩ x ist = ⟨∣ R is − R it ∣ ∆ H ix ist ⟩ x ist (39)using the formalism developed in Eqs. 6–10 of Ref. 29. These solved x ist values are then usedto rescale the two terms of the X factor introduced in Eq. 17 to arrive at the expression B = πl B κ q A q B − l ∫ dkk ( k + κ ) N A ∑ s,t = N B ∑ l,m = σ As σ At σ Bl σ Bm e − ( kb ) [ x Ast ∣ s − t ∣+ x Blm ∣ l − m ∣] (40)for the second virial coeﬃcient in the formulation with renormalized Kuhn lengths. In thecase of a salt-free solution of overall charge neutral polymers, this expression reduces to B eﬀ2 ∣ κ D → ,q A q B = = √ π l b N A ∑ s,t = N B ∑ l,m = σ As σ At σ Bl σ Bm √ x Ast ∣ s − t ∣ + x Blm ∣ l − m ∣ , (41)which is the modiﬁed (renormalized) form of Eq. 23.The resulting heatmap of the K D values calculated in this manner is provided in Fig. 8a.Unlike the results obtained using the base theory with a simple Gaussian chain model(Fig. 2b), the theory of renormalized Kuhn lengths predicts that some sv sequence pairsdo not bind at all, as indicated by the white regions in Fig. 8a. Furthermore, instead ofbinding propensity being monotonic with charge segregation (quantiﬁed by − SCD) as pre-dicted by the base theory, some sv sequence pairs deviate from the trend. Speciﬁcally,highly charge segregated sequences with large − SCD values seem to avoid interactions withsequences with only a medium charge segregation with moderate − SCD values.These contrasts between the base theory and the formulation with renormalized Kuhnlengths are underscored in Fig. 8b where the numerical diﬀerences in predicted bindingaﬃnities by the two formulations are plotted. Apparently, the approximate account of intra-22 –10 –10 K D-1 (M -1 )–SCD Δ K D-1 (M -1 )(a) (b) Figure 8: Heatmap of binding aﬃnities of the 30 ×

30 overall charge neutral sv sequencepairs computed using Eq. 41 in the formulation with renormalized Kuhn lengths. Whitesquares indicate an unfavorable (repulsive) interaction and grey squares indicate a weak K D of greater than 5 mM. The results are quite diﬀerent from those provided in Fig. 2b for thebase theory with a bare (not renormalized) Kuhn length. (b) Heatmap of diﬀerence in thesame sequence pairs’ binding aﬃnities predicted by the two theories (base-theory prediction minus renormalized-Kuhn-lengths prediction). In general, more charge segregated sequences,i.e., those with higher − SCD values, exhibit a higher reduction in binding aﬃnities whenintrachain interactions are accounted for approximately using renormalized Kuhn lengths.chain interactions aﬀorded by renormalized Kuhn lengths posits a larger decrease in bindingaﬃnities relative to that predicted by the base theory or high − SCD sequences than forlow − SCD sequences. The K D s predicted by the two theories and the binding probabilitiesobtained from explicit-chain simulations for several example sv sequence pairs are comparedin more detail in Fig. 9. These predictions are physically intuitive as sequences with larger − SCDs generally have stronger intrachain interactions, although the magnitude of the eﬀectis likely overestimated. With the last caveat, the higher simulated binding of sv15–sv28relative to that of sv20–sv28 may be understood in terms of sv20’s more favorable intrachaininteraction (Fig. 9). In this context, it would be extremely interesting to explore in futureinvestigations the impact of the improved formulation of x Ast and x Blm proposed recently byHuihui and Ghosh on the association of sv model sequences and other polyampholytes.In particular, for the polyelectrolyte H1-ProT α system considered above (Fig. 1), since thehighly open individual H1 and ProT α conformations at low salt are expected to entail more23 v1 sv10Das and Pappu Polymersv15 sv20 sv24 sv25020000400006000080000 K D ( M ) Figure 9: Binding aﬃnities of example sv sequence pairs. Theoretical and simulation resultsare provided for sv28 pairing individually with sv1, sv10, sv15, sv20, sv24, and sv25. Asin Fig. 7, predictions by theory using simple Gaussian chains without renormalized Kuhnlengths (Eq. 23) are shown in dark blue, explicit-chain simulation results, calculated anewhere using the regression method described in the Supporting Information for T ∗ = . to result in a net repulsion (negative light blue bars for not only sv20–sv28 butalso sv10–sv28 and sv15–sv28).favorable H1-ProT α interactions than their less open individual conformations at high salt,an analytical theory with renormalized Kuhn lengths for individual IDP chains would likelylead to a higher salt sensitivity for K D and hence better agreement with experiments. Thisexpectation, however, remains to be tested. Conclusions

In summary, we have developed an analytical account of charge sequence-dependent fuzzybinary complexes with novel two-chain charge pattern parameter jSCD emerging as a key de-24erminant not only of binary binding aﬃnity but also of multiple-chain phase separation. Theformulation elucidates the dominant role of conformational disorder and sequence-speciﬁcityin IDP-IDP binding, and provides a footing for empirical correlation between single- and two-chain IDP properties with their sequence-dependent phase-separation propensities.

While the formulation is limited inasmuch as it is a high-temperature approximation and fur-ther developments, including extension to sequence patterns of uncharged residues, are desirable, the charge sequence dependence predicted herein is largely in line with explicit-chain simulation. As such, the present formalism oﬀers conceptual advances as well as utilityfor experimental design and eﬃcient screening of candidates of fuzzy complexes.

Acknowledgements

We thank Robert Best, Aritra Chowdhury, Julie Forman-Kay, Alex Holehouse, Jeetain Mit-tal, Rohit Pappu, and Wenwei Zheng for helpful discussions, and Ben Schuler for insightfulcomments on an earlier version of this paper (arXiv:1910.11194v1) and sharing unpublisheddata. This work was supported by Canadian Institutes of Health Research grants MOP-84281, NJT-155930, Natural Sciences and Engineering Research Council of Canada Discov-ery grant RGPIN-2018-04351, and computational resources provided by Compute/CalculCanada.The authors declare no conﬂict of interest. 25 upporting Information

Derivation for B representations Starting from the partition function representation (ﬁrst equality of Eq. 3 in the main text), B = V − Q AB Q A Q B , we denote the isolated single-chain Hamiltonians in units of k B T ( k B is Boltzmann constantand T is absolute temperature) for A and B , respectively, as H A [ R A ] and H B [ R B ] . Thecorresponding conformational partition functions are then given by Q i = V ∫ D [ R i ] e −H i [ R i ] , i = A, B (S1a) Q AB = V ∫ D [ R A ] D [ R B ] e −H A [ R A ]−H B [ R B ]− U AB [ R A , R B ] , (S1b)where 1 / V cancels the degeneracy due to translational invariance. It follows that Q AB Q A Q B = V ∫ D [ R A ] D [ R B ] e −H A [ R A ]−H B [ R B ]− U AB [ R A , R B ] ∫ D [ R A ] e −H A [ R A ] ∫ D [ R B ] e −H B [ R B ] = V ∫ D [ R A ] D [ R B ] e −H A [ R A ] ∫ D [ R A ] e −H A [ R A ] e −H B [ R B ] ∫ D [ R B ] e −H B [ R B ] e − U AB [ R A , R B ] ≡ V ∫ D [ R A ] D [ R B ]P A [ R A ]P B [ R B ] e − U AB [ R A , R B ] , (S2)where, as noted in the main text, U AB is in units of k B T , the single-chain probability densityfunction P i [ R i ] ≡ e −H i [ R i ] ∫ D [ R i ] e −H i [ R i ] , i = A, B , (S3)S1nd hence ∫ D [ R i ]P i [ R i ] =

1. Substituting Eq. S2 for Q AB /(Q A Q B ) results in the secondequality in Eq. 3 of the main text, viz., B = V ∫ D [ R A ] D [ R B ]P A [ R A ]P B [ R B ] ( − e − U AB [ R A , R B ] ) . We now proceed to decouple translational invariance from the internal degrees of freedom ofthe chain molecules by the following change of coordinates: { R i , R i , . . . , R iN i } → { R i , r i , r i , . . . , r iN i − } , r is ≡ R is + − R is , (S4)which allows all intramolecular residue-residue distances of chain i be expressed solely interms of r i s: R is − R it = s − ∑ τ = t r iτ ( s > t ) . (S5)Since the potential energy of an isolated chain molecule in homogeneous space should dependonly on the relative positions of its residues irrespective of the location of the chain’s center-of-mass, the single-chain Hamiltonian for chain i should be a function of r i s and independentof the position of any one single residue, which we may choose, without loss of generality, asthe position R i of the ﬁrst residue. With this consideration, the partition functions Q A , Q B can be rewritten as Q i = V ∫ d R i D [ r i ] e −H i [ r i ] = ∫ D [ r i ] e −H i [ r i ] , i = A, B, (S6)where D [ r i ] ≡ ∏ N i − s = d r is and because ∫ d R i / V =

1. For distances between residues ondiﬀerent chains, R ABst ≡ R As − R Bt = s ∑ τ = r Aτ − t ∑ µ = r Bµ + R AB , (S7)where R AB ≡ R A − R B . Thus, the intermolecular interaction U AB is a function of R AB and r A , r B (shorthand for { r A } = { r A , r A , . . . , r iN A − } , { r B } = { r B , r B , . . . , r iN B − } ). TheS2artition function of the A - B complex may then be expressed as Q AB = V ∫ d R A d R B D [ r A ] D [ r B ] e −H A [ r A ]−H B [ r B ]− U AB [ r A , r B , R AB ] = ∫ d R AB D [ r A ] D [ r B ] e −H A [ r A ]−H B [ r B ]− U AB [ r A , r B , R AB ] , (S8)where the second equality follows from the change of variable { R A , R B } → { R AB , R B } (Ja-cobian equals unity) and the fact that ∫ d R B / V =

1. In terms of { r i } , the single-chainconformational probability density functions are given by P i [ r i ] = e −H i [ r i ] ∫ D [ r i ] e −H i [ r i ] , i = A, B . (S9)To arrive at a physically more intuitive (but mathematically equivalent) formulation, wemay replace the R AB distance between the ﬁrst residues of the two diﬀerent chains as anintegration variable by the R AB CM distance between the centers of mass of the two chains whileleaving all { r i } variables unchanged. Since the center-of-mass distance is deﬁned as R AB CM = ∑ N A s = M As R As ∑ N A s = M As − ∑ N B t = M Bt R Bt ∑ N B t = M Bt = R AB + ∑ N A s = M As ∑ s − τ = r Aτ ∑ N A s = M As − ∑ N B t = M Bt ∑ t − µ = r Bτ ∑ N B t = M Bt , (S10)where M is is the mass of the s th residue in chain i , ∣ ∂ R AB CM / ∂ R AB ∣ =

1, and because ∂ r is / ∂ R AB = i = A, B and s = , , . . . , N i −

1, the Jacobian of this coordinate transformation is unity.Hence, by integrating variable shift d R AB → d R AB CM , one obtains Q AB Q A Q B = ∫ d R AB CM D [ r A ]P A [ r A ] D [ r B ]P B [ r B ] e − U AB [ r A , r B , R AB CM ] ≡ ∫ d R AB CM ⟨ e − U AB [ R AB CM ; r A , r B ] ⟩ A,B , (S11)S3hich leads immediately to the center-of-mass representation B = ∫ d R AB CM ⟨ − e − β U AB [ R AB CM ; R A , R B ] ⟩ A,B given by Eq. 2 of the main text with the β = / k B T factor explicitly included. Derivation for B in terms of Mayer f -functions We now substitute the cluster expansion in Eq. S12 of the main text, e − U AB − ≈ N A ∑ s = N B ∑ t = f st + N A ∑ s ≥ t = N B ∑ l ≥ m = f sl f tm − N A ∑ s = N B ∑ t = f st (S12)(where s ≥ t, l ≥ m in the second term on the right hand side means that every term beingsummed is distinct), into the B formula in Eq. 3 of the main text, B = − V ∫ D [ R A ] D [ R B ]P A [ R A ]P B [ R B ] ( e − U AB [ R A , R B ] − ) , to perform the D [ R A ] D [ R B ] integration for each of the three summation terms in Eq. S12.To do so, it is useful to ﬁrst make the { R i } → { r i }∪{ R i } change of variables, then substitutethe P i [ r i ] in Eq. S9 for P i [ R i ] to rewrite Eq. 3 of the main text as B = − ∫ d R AB D [ r A ] D [ r B ]P A [ r A ]P B [ r B ] ( e − U AB [ r A , r B , R AB ] − ) , (S13)S4here U AB [ R A , R B ] → U AB [ r A , r B , R AB ] by virtue of Eq. S7 because U AB [ R A , R B ] takesthe form of U AB [{ R ABst }] and thus f st = f st ( R ABst ) . Substituting Eq. S12 into Eq. S13, B ≈ − N A ∑ s = N B ∑ t = ∫ d R AB D [ r A ] D [ r B ]P A [ r A ]P B [ r B ] f st ( R ABst )+ N A ∑ s = N B ∑ t = ∫ d R AB D [ r A ] D [ r B ]P A [ r A ]P B [ r B ] f st ( R ABst )− N A ∑ s ≥ t = N B ∑ l ≥ m = ∫ d R AB D [ r A ] D [ r B ]P A [ r A ]P B [ r B ] f sl ( R ABsl ) f tm ( R ABtm )≡ B (↔) + B (↔ ) + B (↔↔) . (S14)Using the inverse of the Fourier-transformed matrix of Mayer f -functions [ ˆ f ( k )] st deﬁnedin Eq. 7 of the main text, f st ( r ) = ∫ d k ( π ) [ ˆ f ( k )] st e i k ⋅ r ,B (↔) , B (↔ ) , and B (↔↔) are evaluated. First, a term in the summation over s, t for B (↔) isequal to − ∫ d R AB D [ r A ] D [ r B ]P A [ r A ]P B [ r B ] f st ( R ABst )= − ∫ d R ABst D [ r A ] D [ r B ]P A [ r A ]P B [ r B ] ∫ d k ( π ) [ ˆ f ( k )] st e i k ⋅ R ABst = − ∫ d k ( π ) [ ˆ f ( k )] st ∫ d R ABst e i k ⋅ R ABst = − ∫ d k ( π ) [ ˆ f ( k )] st ( π ) δ ( k )= − [ ˆ f ( )] st (S15)because the d R AB → d R ABst change in integration variable for the interchain distance can beapplied without aﬀecting the integrations over P i [ r i ] . It follows from Eq. S14 that B (↔) ≡= − N A ∑ s = N B ∑ t = [ ˆ f ( )] st . (S16)S5econd, every corresponding term for B (↔ ) is integrated by the same change of variable: ∫ d R AB D [ r A ] D [ r B ]P A [ r A ]P B [ r B ] f st ( R ABst )= ∫ d R ABst D [ r A ] D [ r B ]P A [ r A ]P B [ r B ] ∫ d k ( π ) [ ˆ f ( k )] st e i k ⋅ R ABst ∫ d k ′ ( π ) [ ˆ f ( k ′ )] st e i k ′ ⋅ R ABst = ∫ d k ( π ) d k ′ ( π ) [ ˆ f ( k )] st [ ˆ f ( k ′ )] st ∫ d R ABst e i ( k + k ′ )⋅ R ABst = ∫ d k ( π ) d k ′ ( π ) [ ˆ f ( k )] st [ ˆ f ( k ′ )] st ( π ) δ ( k + k ′ )= ∫ d k ( π ) [ ˆ f ( k )] st [ ˆ f (− k )] st . (S17)Therefore, by Eq. S14, B (↔ ) = N A ∑ s = N B ∑ t = ∫ d k ( π ) [ ˆ f ( k )] st [ ˆ f (− k )] st = ∫ d k ( π ) Tr [ ˆ f ( k ) ˆ f T (− k )] , (S18)where the “T” superscript on a matrix denotes transposing the given matrix. Third, eachof the terms in the summation for B (↔↔) , involving two residue pairs ( s A , l B ) and ( t A , m B ) satisfying the s ≥ t, l ≥ m condition, can also be evaluated by a similar change of integrationvariable. Because R ABsl = R AB + s − ∑ τ = r Aτ − l − ∑ µ = r Bµ = R AB + ( t − ∑ τ = + s − ∑ τ = t ) r Aτ − ( m − ∑ µ = + l − ∑ µ = m ) r Bµ = R ABtm + s − ∑ τ = t r Aτ − l − ∑ µ = m r Bµ , (S19)S6y making the d R AB → d R ABtm change in integration variable, we obtain ∫ d R AB D [ r A ] D [ r B ]P A [ r A ]P B [ r B ] f sl ( R ABsl ) f tm ( R ABtm )= ∫ d R ABtm D [ r A ] D [ r B ]P A [ r A ]P B [ r B ] ∫ d k ( π ) [ ˆ f ( k )] sl e i k ⋅ R ABsl ∫ d k ′ ( π ) [ ˆ f ( k ′ )] tm e i k ′ ⋅ R ABtm = ∫ d k ( π ) d k ′ ( π ) [ ˆ f ( k )] sl [ ˆ f ( k ′ )] tm ∫ d R ABtm e i ( k + k ′ )⋅ R ABtm × ∫ D [ r A ]P A [ r A ] e i k ⋅∑ s − τ = t r Aτ ∫ D [ r B ]P B [ r B ] e − i k ⋅∑ l − µ = m r Bµ = ∫ d k ( π ) d k ′ ( π ) [ ˆ f ( k )] sl [ ˆ f ( k ′ )] tm ( π ) δ ( k + k ′ ) ⟨ e i k ⋅( R As − R At ) ⟩ A ⟨ e − i k ⋅( R Bl − R Bm ) ⟩ B ≡ ∫ d k ( π ) [ ˆ f ( k )] sl [ ˆ f (− k )] tm [ ˆ P A ( k )] st [ ˆ P B (− k )] lm , (S20)where [ ˆ P i ( k )] st = ∫ D [ r A ]P A [ r A ] e i k ⋅∑ s − τ = t r iτ = ∫ D [ R i ]P i [ R i ] e i k ⋅( R is − R it ) = [ ˆ P i T (− k )] st , (S21) i = A, B , is the Fourier transformation of the intrachain residue-residue correlation functionin Eq. S21 of the main text. B (↔↔) is then computed by rearranging the summation: B (↔↔) = − N A ∑ s ≥ t = N B ∑ l ≥ m = f sl f tm = − N A ∑ s,t = N B ∑ l,m = f sl f tm − N A ∑ s = N B ∑ l = f sl = − ∫ d k ( π ) ⎧⎪⎪⎨⎪⎪⎩ Tr [ ˆ f ( k ) ˆ P B (− k ) ˆ f T (− k ) ˆ P A (− k )] + Tr [ ˆ f ( k ) ˆ f T (− k )] ⎫⎪⎪⎬⎪⎪⎭ . (S22)The last equality in the above Eq. S22 follows because we have applied the last equality inEq. S21, i.e., [ ˆ P i ( k )] st = [ ˆ P i (− k )] ts , and Eq. S18. Now, by combining Eqs. S16, S18, andS722, the cluster expansion expression for B up to O ( f ) is given by B ≈ B (↔) + B (↔↔) + B (↔ ) = − N A ∑ s = N B ∑ t = [ ˆ f ( )] st − ∫ d k ( π ) Tr [ ˆ f ( k ) ˆ P B (− k ) ˆ f T (− k ) ˆ P A (− k ) − ˆ f ( k ) ˆ f T (− k )] , (S23)which is reported in the main text as Eq. 8. Generating sequences with random charge patterns

Random sequences for our charge pattern analysis are constructed as follows. For eachinteger i between 1 and 25, 40 random neutral sequences containing i positively chargedresidues (each carries + i negatively charged residues (each carries − − i neutral residues (carry 0 charge) are generated by randomly permuting the array (+ , . . . , + , , . . . , , − , . . . , − ) with + − i times and 0 repeated 50 − i times to produce 1,000 random sequences. 1,000 random pairs of the sequences in this poolof 1,000 sequences are then selected to investigate the correlation between jSCD and SCD. Mathematical principles for negative SCD

Here we present an eﬃcient numerical method to address the possible sign(s) of SCD values.Although a rigorous proof for sequences of all lengths is still lacking, the analysis below, whichcovers sequences of lengths up to 1,001, should provide a practical guide as to whether all charge neutral sequences have a negative SCD, which is a remarkable observation that hasso far been borne out empirically from sequences chosen to be studied in the literature.Consider a polymer of N + σ = ( σ , σ . . . , σ N ) .By deﬁnition, SCD ( σ ) ≡ ∑ Ni = ∑ Nj = i + σ i σ j √∣ i − j ∣ . If we deﬁne the matrix ˆ A N + with el-S8ments ( ˆ A N + ) ij = √∣ i − j ∣ , SCD ( σ ) = σ T ( ˆ A N + / ) σ . If σ is a charge pattern such that ∑ Ni = σ i = σ = − ∑ Ni = σ i . Now, deﬁning ¯ σ = ( σ , σ , . . . , σ N ) and the matrix ˆ B N with ele-ments ( ˆ B N ) ij = √∣ i − j ∣ − √ i − √ j , one can see that, SCD ( σ ) = σ T ( ˆ A N + / ) σ = ¯ σ T ( ˆ B N / ) ¯ σ .Thus the requirement that SCD ( σ ) < σ with ∑ Ni = σ i = v T ˆ B N v < N -dimensional column vector v . It is a standard resultof linear algebra that, since ˆ B N is self-adjoint, this is in turn equivalent to ˆ B N being a so-called “negative matrix”, i.e., all of ˆ B N ’s eigenvalues being negative. Notice as well that for M < N , ˆ B M is the top left M × M submatrix of ˆ B N , therefore, should ˆ B N be negative, ˆ B M would also be negative. For N = , − . B is shown in Fig. S1a. Most charge-dispersed pattern (analyzed for N = ). Another quan-tity of interest is the smallest − SCD possible for a neutral polyelectrolyte of some minimumnonzero charge (otherwise the totally neutral sequence in which every monomer carries 0charge would have the lowest − SCD at 0). In this regard, it is of interest to determine thelowest possible σ T ˆ A N σ / σ T σ ratio for overall charge neutral σ and the charge pattern thatproduces it. The minimal value of this ratio produced by method of gradient descent isabout − . − .

826 for the strictly alternating 50-residue polyampholyte sv1.

SCD values of non-neutral sequences.

For a N -mer charge pattern σ whichis not necessarily overall neutral, we can deﬁne its average charge ⟨ σ ⟩ ≡ ∑ Ni = σ i / N andrepresent its sequence charge pattern by a column vector p with components p i = σ i − ⟨ σ ⟩ .Thus we may write σ = p + ⟨ σ ⟩ where is the N -vector with a 1 in every entry. Now weS9an express SCD asSCD ( σ ) = σ T ˆ A N q = p T ˆ A N p + ⟨ σ ⟩ p T ˆ A N + ⟨ σ ⟩ T ˆ A N = SCD ( p ) + ⟨ σ ⟩ N ∑ i = p i ( N ∑ j = √∣ i − j ∣) + ⟨ σ ⟩ N ∑ i N ∑ j √∣ i − j ∣≈ SCD ( p ) + σ N ∑ i = p i [ i / + ( N − i ) / ] + ⟨ σ ⟩ N / , (S24)where the last approximation follows by evaluating sums as integrals ( ∑ Nz = → ∫ N dz ).SCD ( p ) is negative as p is overall charge neutral while 4 ⟨ σ ⟩ N / /

15 is, of course, positiveand seemingly the primary contributor to increasing SCD for overall non-neutral sequences.As for the second (middle) term in the last expression, we note that [ i / + ( N − i ) / ] takeslargest values when i is low or high, i.e., when it represents monomers near the termini ofthe polymer sequence. It follows that ⟨ σ ⟩ ∑ Ni = p i [ i / + ( N − i ) / ] is positive if and only ifthe distribution of those monomers with charges of the same sign as that of the averagecharge ⟨ σ ⟩ is biased in favor of being positioned at the two chain termini. In future studies,it would be interesting to explore possible relationship between this ﬁnding and the recentlydiscovered role of monomer type at chain termini in phase separation of model chains withhydrophobic and hydrophilic monomers (labeled “T” and “H”, respectively, and corre-spond essentially, in that order, to the H and P monomers in the HP model ) as well asthe recently proposed “SHD” sequence hydropathy pattern measure for IDPs. Explicit-chain simulation model and methods

Coarse-grained molecular dynamics simulations are conducted for six example pairs of N = ∣ SCD ∣ sequence, sv28, in common, that partners individually withsix sv sequences spanning almost the entire range of charge patterns of the 30 sv sequences.S10he pairs are sv28–sv1, sv28–sv10, sv28–sv15, sv28–sv20, sv28–sv24, and sv28–sv25.We adopt the simulation model and method our group has recently applied to studyIDP phase separation. Here, for simplicity, as in Ref. 38, each residue (monomer) isrepresented by a van der Waals sphere of the same size and mass. Each positively or nega-tively charged residue carries + e or − e charges, respectively, where e is elementary electroniccharge. The potential energy function used for the study consists of screened electrostatic,non-bonded Lennard-Jones (LJ) and bonded interactions. For any two residues ( i, s ) and ( j, t ) —the s th residue of the i th chain and the t th residue of the j th chain—that carrycharges σ is and σ jt , respectively, the residue-residue electrostatic interaction is given by U el = σ is σ jt e π(cid:15) (cid:15) r r i,s ; j,t exp (− κ D r i,s ; j,t ) , (S25)where (cid:15) is vacuum permittivity, (cid:15) r is relative permittivity, and r i,s ; j,t is the distance betweenresidues ( i, s ) and ( j, t ) . We use κ D = /( a ) for the chain simulations in this work, where a is a length unit with roles that will be apparent below. If we take a to correspond roughlyto the C α –C α virtual bond length of 3 . a ≈

11 ˚A would be approximatelyequal to the Debye screening length for a physiologically relevant 150 mM aqueous solutionof NaCl. The non-bonded LJ interaction is constructed using the length scale a as follows.Beginning with the standard LJ potential, U LJ = ε LJ ⎡⎢⎢⎢⎢⎣( ar i,s ; j,t ) − ( ar i,s ; j,t ) ⎤⎥⎥⎥⎥⎦ , (S26)where ε LJ and a are the depth and range of the potential, respectively, we perform a cutoﬀand shift on Eq. S26 to render the potential purely repulsive. Since the main goal here isto compare explicit-chain simulation with analytical theory, we use the non-bonded LJ partof the potential only for excluded volume repulsion so that all attractive interactions in themodel arise from electrostatics as in the analytical theories considered by this work. Theﬁnal purely repulsive non-bonded LJ potential, U cutoﬀLJ ( ≥ r i,s ; j,t ), that enters ourS11imulation takes the Weeks-Chandler-Andersen form U cutoﬀLJ = ⎧⎪⎪⎪⎪⎨⎪⎪⎪⎪⎩ U LJ + ε LJ , for r i,s ; j,t ≤ / a , for r i,s ; j,t > / a . (S27)As we have learned from Ref. 38, the interaction among sv sequences can be strongly in-ﬂuenced by any background non-electrostatic interaction. To make the energetics of ourmodel system dominated by electrostatic interaction as in the analytical theories, we set ε LJ = ε /

48, where ε ≡ e /( π(cid:15) (cid:15) r a ) is the electrostatic energy at separation a , so that short-range excluded-volume repulsion is signiﬁcantly weaker than electrostatic interaction in theexplicit-chain model. ε and a are used, respectively, as energy and length units in our simu-lations. As before, the bonded interaction between connected monomers is modeled using aharmonic potential U bond = K bond ( r i,s ; i,s + − a ) , (S28)with K bond = , ε / a as in Ref. 65 and also our previous simulation of sv sequences. The strength of this term is in line with the TraPPE force ﬁeld.

All simulations are performed using the GPU version of HOOMD-blue simulation pack-age at ten diﬀerent temperatures (reported as reduced temperature T ∗ ≡ k B T / ε = l B / a for simulation results in this work) between 0.05 T ∗ and 0 . T ∗ with an interval of 0 . T ∗ using a timestep of 0 . τ , where τ = √ ma / ε is the reduced time deﬁned by residue mass m . For a given pair of sv sequences, simulation is initialized by randomly placing the twochains in a large cubic box of dimension 100 a × a × a then followed by 500 τ of energyminimization. The electrostatic interactions among the residues are treated with the PPPMmethod using a real-space cutoﬀ distance of 15 a and a ﬁxed Debye screening length of 3 a .After energy minimization, the system is heated to its desired temperature in a time periodof 2 , τ using Langevin dynamics with a weak friction coeﬃcient of 0 . m / τ (Ref 65).Motions of the residues are integrated using velocity-Verlet scheme with periodic bound-ary conditions. After the desired temperature is achieved, a production run of 500,000 τ isS12onducted and trajectory snapshots are saved every 0 . τ for subsequent analysis. Analysis of simulation data on binding

For each simulation conducted for a given sv sequence pair, the simulated trajectory isexamined for the center-of-mass separation between the two chain sequences to ascertainwhether the chains form a binary complex in each and all snapshot collected. In the courseof our investigation, we found that for simulations conducted at relatively low temperatures, T ∗ < .

35, there were only very limited jumps between an unbound state and what would bereasonably considered as the bound state (Fig. S2), suggesting that the simulated system maynot have suﬃcient sampling at such low temperatures. We therefore focus on simulationsconducted at T ∗ ≥ . θ of the six pairs of sv sequences at T ∗ = .

35, 0 . .

45, and 0 . θ , of two noninteracting monomer,where θ = [ π ( a ) / ]/( a ) , from the simulated bound-state ratio, and use ˜ θ = θ − θ toquantify the binding probability produced by interaction energies.Combining the simulation results from T ∗ = .

35, 0 .

4, 0 .

45, and 0 .

5, we estimate anenthalpic parameter ∆ H and an entropic parameter ∆ S for the binding for each of the sixsv sequence pairs using the linear regression∆ H / T ∗ − ∆ S = log ( θ − − ) , (S29)the results of which are reported in Table S2. The ﬁtted T ∗ -dependent θ s are then used tocompute the corresponding ˜ θ values at the same T ∗ for all sv sequence pairs to compare withthe theory-predicted K D s in Fig. 7c and Fig. 9 of the main text.S13 able S1: Theoretical and experimental ITC and smFRET K D s (in unitsof µ M) of H1-ProT α fuzzy complexes at diﬀerent NaCl concentrations ([NaCl]in mM). Amino acid sequences (in one-letter code) used in the theoreticalcalculation are taken from those studied by experiments, as follows (residuesin red are not in the wildtype, they include those remaining after proteolyticcleavage of the HisTag).ProT α (the “ProT α (without His-tag)” sequence in Ref. 24):GSYMSDAAVDTSSEITTKDLKEKKEVVEEAENGRDAPANGNAENEENGEQEADNEVDEEEEEGGEEEEEEEEGDGEEEDGDEDEEAESATGKRAAEDDEDDDVDTKKQKTDEDD;H1 (from Ref. 24):MTENSTSAPAAKPKRAKASKKSTDHPKYSDMIVAAIQAEKNRAGSSRQSIQKYIKSHYKVGENADSQIKLSIKRLVTTGVLKQTKGVGASGSFRLAKSDEPKKSVAFKKTKKELKKVATPKKASKPKKAASKAPTKKPKATPVKKTKKELKKVATPKKAKKPKTVKAKPVKASKPKKAKPVKPKAKSSAKRAGKKKHHHHHH;H1-CTR (H1-C-terminal region, from Ref. 23):SVAFKKTKKEIKKVATPKKASKPKKAASKAPTKKPKATPVKKAKKKLAATPKKAKKPKTVKAKPVKASKPKKAKPVKPKAKSSAKRAGKKKGGPR.In the theoretical calculation, aspartic acid, glutamic acid (D, E) residues areeach assigned − charge; arginine, lysine (R, K) residues are each assigned + charge; all other residue types are considered neutral ( charge). The “Theory”results in the table are calculated using both terms for B in Eq. 12 of themain text, whereas “Theory Net Charge” results are calculated using only theﬁrst term in the same equation. Because Eq. 12 relies on the Gaussian-chainapproximation which may not be adequate for the N-terminal globular domainof H1, in addition to the data presented in Fig. 1 of the main text, we computealso K D s for the binding between the fully disordered C-terminal region of H1(termed H1-CTR) with ProT α using both terms for B in Eq. 12 of the maintext and the 95-residue sequence for H1-CTR listed above. The resulting K D slisted under “Theory H1-CTR” in this table are about 1.2–1.5 times higher thanthose of full-length H1. This diﬀerence in ProT α binding between full-lengthand C-terminal H1 is likely attributable to the subtraction of the + charges inits N-terminal domain. [NaCl](mM) Theory TheoryH1-CTR TheoryNet Charge ITC [NaCl](mM) smFRET165 3.41 4.59 142 0.46 ± ± ± ± ± ( . + . − . ) × − ( . ± . ) × − ( . ± . ) × − ( . ± . ) × −

290 0.23 ± ± ± able S2: Simulated binding data and regression parameters; r is square ofPearson correlation coeﬃcient of the regression. Sequence θ ∣ T ∗ = . θ ∣ T ∗ = . θ ∣ T ∗ = . θ ∣ T ∗ = . ∆ H ∆ S r sv1 0.362 % 0.432 % 0.420 % 0.252 % − . − .

32 0.225sv10 0.736 % 0.743 % 0.591 % 0.351 % − . − .

08 0.720sv15 1.01 % 1.64 % 0.923 % 0.803 % − . − .

46 0.202sv20 0.812 % 1.34 % 0.381 % 0.700 % − . − .

33 0.178sv24 4.04 % 1.83 % 2.56 % 0.976 % − . − .

17 0.703sv25 3.00 % 0.590 % 0.912 % 0.228 % − . − .

10 20 30 40 501.51.00.50.00.51.01.5

Monomerlog(- λ ) N u m b e r C h a r g e (a) (b) Figure S1: SCD value analysis. (a) The distribution of eigenvalues of the matrix ˆ B intro-duced in the text of this Supporting Information for addressing the mathematical principlesof negative SCD values for overall neutral sequences; all eigenvalues (denoted by λ ) shownare negative, demonstrating deﬁnitively that the SCD value of any overall charge neutralsequence with equal or fewer than 1,001 residues is negative. The methodology can readilybe extended to test longer sequences insofar as it is numerically feasible to determine thepertinent eigenvalues. (b) The charge distribution of a 50-residue, overall charge-neutralpolyampholyte that produces the least-negative SCD value attained numerically using gra-dient descent method. S16 T * =0.05 s v - s v s v - s v s v - s v s v - s v s v - s v s v - s v time (1000 τ ) C e n t e r - o f - M a ss D i s t a n c e ( a ) T * =0.15 T * =0.25 Figure S2: Time dependence of the center-of-mass separation ∣ R AB CM ∣ between the two se-quences ( A , B ) in the explicit-chain simulations of sv sequence pairs at T ∗ = .

05, 0 .

15, and0 .

25 [ A = sv28, B = (top to bottom) sv1, sv10, sv15, sv20, sv24, and sv25]. Dashed horizon-tal lines mark ∣ R AB CM ∣ = a , the cutoﬀ adopted in the present work for identifying a “boundstate” of the two polyampholyte chains. None of the 18 center-of-mass distances plottedcrosses the dashed lines more than ﬁve times, indicating potential limitations in samplingunder thermodynamic equilibrium conditions.S17 eferences (1) Dunker, A. K.; Lawson, J. D.; Brown, C. J.; Williams, R. M.; Romero, P.; Oh, J. S.;Oldﬁeld, C. J.; Campen, A. M.; Ratliﬀ, C. R.; Hipps, K. W. et al. Intrinsically disor-dered protein. J. Mol. Graphics & Modelling , , 26–59.(2) van der Lee, R.; Buljan, M.; Lang, B.; Weatheritt, R. J.; Daughdrill, G. W.;Dunker, A. K.; Fuxreiter, M.; Gough, J.; Gsponer, J.; Jones, D. T. et al. Classiﬁcationof intrinsically disordered regions and proteins. Chem. Rev. , , 6589–6631.(3) Uversky, V. N. Natively unfolded proteins: A point where biology waits for physics. Protein Sci. , , 739–756.(4) Wright, P. E.; Dyson, H. J. Linking folding and binding. Curr. Opin. Struct. Biol. , , 31–38.(5) Bah, A.; Vernon, R. M.; Siddiqui, Z.; Krzeminski, M.; Muhandiram, R.; Zhao, C.;Sonenberg, N.; Kay, L. E.; Forman-Kay, J. D. Folding of an intrinsically disorderedprotein by phosphorylation as a regulatory switch. Nature , , 106–109.(6) Marsh, J. A.; Teichmann, S. A.; Forman-Kay, J. D. Probing the diverse landscape ofprotein ﬂexibility and binding. Curr. Opin. Struct. Biol. , , 643–650.(7) Borg, M.; Mittag, T.; Pawson, T.; Tyers, M.; Forman-Kay, J. D.; Chan, H. S. Polyelec-trostatic interactions of disordered ligands suggest a physical basis for ultrasensitivity. Proc. Natl. Acad. Sci. U. S. A. , , 9650–9655.(8) Mittag, T.; Orlicky, S.; Choy, W.-Y.; Tang, X.; Lin, H.; Sicheri, F.; Kay, L. E.; Ty-ers, M.; Forman-Kay, J. D. Dynamic equilibrium engagement of a polyvalent ligandwith a single-site receptor. Proc. Natl. Acad. Sci. U. S. A. , , 17772–17777.(9) Tompa, P.; Fuxreiter, M. Fuzzy complexes: polymorphism and structural disorder inprotein-protein interactions. Trends Biochem. Sci. , , 2–8.R110) Sharma, R.; Raduly, Z.; Miskei, M.; Fuxreiter, M. Fuzzy complexes: Speciﬁc bindingwithout complete folding. FEBS Lett. , , 2533–2542.(11) Miskei, M.; Antal, C.; Fuxreiter, M. FuzDB: Database of fuzzy complexes, a tool todevelop stochastic structure-function relationships for protein complexes and higher-order assemblies. Nucl. Acids Res. , , D228–D235.(12) Arbes´u, M.; Iruela, G.; Fuentes, H.; Teixeira, J. M. C.; Pons, M. Intramolecular fuzzyinteractions involving intrinsically disordered domains. Front. Mol. Biosci. , , 39.(13) Csizmok, V.; Orlicky, S.; Cheng, J.; Song, J.; Bah, A.; Delgoshaie, N.; Lin, H.; Mit-tag, T.; Sicheri, F.; Chan, H. S. et al. An allosteric conduit facilitates dynamic multisitesubstrate recognition by the SCF Cdc4 ubiquitin ligase.

Nat. Comm. , , 13943.(14) Song, J.; Ng, S. C.; Tompa, P.; Lee, K. A. W.; Chan, H. S. Polycation- π interactionsare a driving force for molecular recognition by an intrinsically disordered oncoproteinfamily. PLoS Comput. Biol. , , e1003239.(15) Chen, T.; Song, J.; Chan, H. S. Theoretical perspectives on nonnative interactions andintrinsic disorder in protein folding and binding. Curr. Opin. Struct. Biol. , ,32–42.(16) Lin, Y.-H.; Brady, J. P.; Forman-Kay, J. D.; Chan, H. S. Charge pattern matching asa ‘fuzzy’ mode of molecular recognition for the functional phase separations of intrin-sically disordered proteins. New J. Phys. , , 115003.(17) Perry, S. L.; Sing, C. E. 100th Anniversary of macromolecular science viewpoint: Op-portunities in the physics of sequence-deﬁned polymers. ACS Macro Lett. , ,216–225.(18) Sigalov, A. B. Structural biology of intrinsically disordered proteins: Revisiting un-solved mysteries. Biochimie , , 112–118.R219) Schuler, B.; Borgia, A.; Borgia, M. B.; Heidarsson, P. O.; Holmstrom, E. D.; Net-tels, D.; Sottini, A. Binding without folding — the biomolecular function of disorderedpolyelectrolyte complexes. Curr. Opin. Struct. Biol. , , 66–76.(20) Sigalov, A. B.; Zhuravleva, A. V.; Orekhov, V. Y. Binding of intrinsically disorderedproteins is not necessarily accompanied by a structural transition to a folded form. Biochimie , , 419–421.(21) Danielsson, J.; Liljedahl, L.; B´ar´any-Wallje, L.; Sønderby, P.; Kristensen, L. H.;Martinez-Yamout, M. A.; Dyson, H. J.; Wright, P. E.; Poulsen, F. M.; M¨aler, L. et al.The intrinsically disordered RNR inhibitor Sml1 is a dynamic dimer. Biochemistry , , 13428–13437.(22) Nourse, A.; Mittag, T. The cytoplasmic domain of the T-cell receptor zeta subunit doesnot form disordered dimers. J. Mol. Biol. , , 62–70.(23) Borgia, A.; Borgia, M. B.; Bugge, K.; Kissling, V. M.; Heidarsson, P. O.; Fernan-des, C. B.; Sottini, A.; Soranno, A.; Buholzer, K. J.; Nettels, D. et al. Extreme disorderin an ultrahigh-aﬃnity protein complex. Nature , , 61–66.(24) Feng, H.; Zhou, B.-R.; Bai, Y. Binding aﬃnity and function of the extremely disor-dered protein complex containing human linker histone H1.0 and its chaperone ProT α . Biochemistry , , 6645–6648.(25) Yang, J.; Zeng, Y.; Liu, Y.; Gao, M.; Liu, S.; Su, Z.; Huang, Y. Electrostatic interactionsin molecular recognition of intrinsically disordered proteins. J. Biomol. Struct. Dyn. , , 1–12.(26) Wang, J.; Choi, J.-M.; Holehouse, A. S.; Lee, H. O.; Zhang, X.; Jahnel, M.; Maha-rana, S.; Lemaitre, R.; Pozniakovsky, A.; Drechsel, D. et al. A molecular grammargoverning the driving forces for phase separation of prion-like RNA binding proteins. Cell , , 688–699.e16. R327) Tsang, B.; Arsenault, J.; Vernon, R. M.; Lin, H.; Sonenberg, N.; Wang, L.-Y.;Bah, A.; Forman-Kay, J. D. Phosphoregulated FMRP phase separation models activity-dependent translation through bidirectional control of mRNA granule formation. Proc.Natl. Acad. Sci. U. S. A. , 4218–4227.(28) Das, R. K.; Pappu, R. V. Conformations of intrinsically disordered proteins are inﬂu-enced by linear sequence distributions of oppositely charged residues.

Proc. Natl. Acad.Sci. U. S. A. , , 13392–13397.(29) Sawle, L.; Ghosh, K. A theoretical method to compute sequence dependent conﬁgura-tional properties in charged polymers and proteins. J. Chem. Phys. , , 085101.(30) Lin, Y.-H.; Chan, H. S. Phase separation and single-chain compactness of chargeddisordered proteins are strongly correlated. Biophys. J. , , 2043–2046.(31) Zheng, W.; Dignon, G.; Brown, M.; Kim, Y. C.; Mittal, J. Hydropathy patterningcomplements charge patterning to describe conformational preferences of disorderedproteins. J. Phys. Chem. Lett. , , 3408–3415.(32) Nott, T. J.; Petsalaki, E.; Farber, P.; Jervis, D.; Fussner, E.; Plochowietz, A.;Craggs, T. D.; Bazett-Jones, D. P.; Pawson, T.; Forman-Kay, J. D. et al. Phase transi-tion of a disordered nuage protein generates environmentally responsive membranelessorganelles. Mol. Cell , , 936–947.(33) Lin, Y.-H.; Forman-Kay, J. D.; Chan, H. S. Sequence-speciﬁc polyampholyte phaseseparation in membraneless organelles. Phys. Rev. Lett. , , 178101.(34) Pak, C. W.; Kosno, M.; Holehouse, A. S.; Padrick, S. B.; Mittal, A.; Ali, R.;Yunus, A. A.; Liu, D. R.; Pappu, R. V.; Rosen, M. K. Sequence determinants of in-tracellular phase separation by complex coacervation of a disordered protein. Mol. Cell , , 72–85. R435) Lin, Y.-H.; Song, J.; Forman-Kay, J. D.; Chan, H. S. Random-phase-approximationtheory for sequence-dependent, biologically functional liquid-liquid phase separation ofintrinsically disordered proteins. J. Mol. Liq. , , 176–193.(36) Lytle, T. K.; Sing, C. E. Transfer matrix theory of polymer complex coacervation. SoftMatter , , 7001–7012.(37) Chang, L.-W.; Lytle, T. K.; Radhakrishna, M.; Madinya, J. J.; V´elez, J.; Sing, C. E.;Perry, S. L. Sequence and entropy-based control of complex coacervates. Nat. Comm. , , 1273.(38) Das, S.; Amin, A. N.; Lin, Y.-H.; Chan, H. S. Coarse-grained residue-based modelsof disordered protein condensates: Utility and limitations of simple charge patternparameters. Phys. Chem. Chem. Phys. , , 28558–28574.(39) McCarty, J.; Delaney, K. T.; Danielsen, S. P. O.; Fredrickson, G. H.; Shea, J.-E. Com-plete phase diagram for liquid–liquid phase separation of intrinsically disordered pro-teins. J. Phys. Chem. Lett. , , 1644–1652.(40) Statt, A.; Casademunt, H.; Brangwynne, C. P.; Panagiotopoulos, A. Z. Model fordisordered proteins with strongly sequence-dependent liquid phase behavior. J. Chem.Phys. , , 075101.(41) Schuster, B. S.; Dignon, G. L.; Tang, W. S.; Kelley, F. M.; Ranganath, A. K.;Jahnke, C. N.; Simplins, A. G.; Regy, R. M.; Hammer, D. A.; Good, M. C. et al. Iden-tifying sequence perturbations to an intrinsically disordered protein that determine itsphase separation behavior. Proc. Natl. Acad. Sci. U. S. A. , , 11421–11431.(42) Das, S.; Eisen, A.; Lin, Y.-H.; Chan, H. S. A lattice model of charge-pattern-dependentpolyampholyte phase separation. J. Phys. Chem. B , , 5418–5431.R543) Zarin, T.; Strome, B.; Nguyen Ba, A. N.; Alberti, S.; Forman-Kay, J. D.; Moses, A. M.Proteome-wide signatures of function in highly diverged intrinsically disordered regions. eLife , , e46883.(44) Panagiotopoulos, A. Z.; Wong, V.; Floriano, M. A. Phase equilibria of lattice polymersfrom histogram reweighting Monte Carlo simulations. Macromolecules , , 912–918.(45) Wang, R.; Wang, Z.-G. Theory of polymer chains in poor solvent: Single-chain struc-ture, solution thermodynamics, and θ point. Macromolecules , , 4094–4102.(46) Dignon, G. L.; Zheng, W.; Best, R. B.; Kim, Y. C.; Mittal, J. Relation between single-molecule properties and phase behavior of intrinsically disordered proteins. Proc. Natl.Acad. Sci. U. S. A. , , 9929–9934.(47) Lin, Y.-H.; Brady, J. P.; Chan, H. S.; Ghosh, K. A uniﬁed analytical theory ofheteropolymers for sequence-speciﬁc phase behaviors of polyelectrolytes and polyam-pholytes. J. Chem. Phys. , , 045102.(48) Pathria, R. K. Statistical Mechanics, 2nd Ed. ; Elsevier, 2006.(49) Smith, A. M.; Lee, A. A.; Perkin, S. The electrostatic screening length in concentratedelectrolytes increases with concentration.

J. Phys. Chem. Lett. , , 2157–2163.(50) Chowdhury, A.; Sottini, A.; Borgia, A.; Borgia, M. B.; Nettels, D.; Schuler, B. Thermo-dynamics of the interaction between biological polyelectrolyte-like disordered proteins:From binary complexes to oligomers. Biophys. J. , , 215A.(51) Muthukumar, M. 50th anniversary perspective: A perspective on polyelectrolyte solu-tions. Macromolecules , , 9528–9560.(52) McCammon, J. A.; Wolynes, P. G.; Karplus, M. Picosecond dynamics of tyrosine sidechains in proteins. Biochemistry , , 927–942.R653) Jha, A. K.; Freed, K. F. Solvation eﬀect on conformations of 1,2:dimethoxyethane:Charge-dependent nonlinear response in implicit solvent models. J. Chem. Phys. , , 034501.(54) Ng, E. W.; Geller, M. A table of integrals of the error functions. J. Res. Natl. Inst.Stand.—B. Math. Sci. , , 1–20.(55) Ermoshkin, A. V.; Olvera de la Cruz, M. Polyelectrolytes in the presence of multivalentions: gelation versus segregation. Phys. Rev. Lett. , , 125504.(56) Danielsen, S. P. O.; McCarty, J.; Shea, J.-E.; Delaney, K. T.; Fredrickson, G. H. Molec-ular design of self-coacervation phenomena in block polyampholytes. Proc. Natl. Acad.Sci. U. S. A. , , 8224–8232.(57) Shen, K.; Wang, Z.-G. Polyelectrolyte chain structure and solution phase behavior. Macromolecules , , 1706–1717.(58) Huihui, J.; Ghosh, K. An analytical theory to describe sequence-speciﬁc inter-residuedistance proﬁles for polyampholytes and intrinsically disordered proteins. J. Chem.Phys , , 161102.(59) Riback, J. A.; Katanski, C. D.; Kear-Scott, J. L.; Pilipenko, E. V.; Rojek, A. E.;Sosnick, T. R.; Drummond, D. A. Stress-triggered phase separation is an adaptive,evolutionarily tuned response. Cell , , 1028–1040.(60) Chou, H.-Y.; Aksimentiev, A. Single-protein collapse determines phase equilibria of abiological condensate. J. Phys. Chem. Lett. , , 4923–4929.(61) Zeng, X.; Holehouse, A. S.; Chilkoti, A.; Mittag, T.; Pappu, R. V. Connecting coil-to-globule transitions to full phase diagrams for intrinsically disordered proteins. Biophys.J. , , 1–17. R762) Chan, H. S.; Dill, K. A. Sequence space soup of proteins and copolymers. J. Chem.Phys. , , 3775–3787.(63) OToole, E. M.; Panagiotopoulos, A. Z. Monte Carlo simulation of folding transitionsof simple model proteins using a chain growth algorithm. J. Chem. Phys. , ,8644–8652.(64) Weeks, J. D.; Chandler, D.; Andersen, H. C. Role of repulsive forces in determining theequilibrium structure of simple liquids. J. Chem. Phys. , , 5237–5247.(65) Silmore, K. S.; Howard, M. P.; Panagiotopoulos, A. Z. Vapour-liquid phase equilibriumand surface tension of fully ﬂexible Lennard-Jones chains. Mol. Phys. , , 320–327.(66) Mundy, C. J.; Siepmann, J. I.; Klein, M. L. Calculation of the shear viscosity of decaneusing a reversible multiple timestep algorithm. J. Chem. Phys. , , 3376–3380.(67) Martin, M. G.; Siepmann, J. I. Transferable potentials for phase equilibria. 1. United-atom description of n-alkanes. J. Phys. Chem. B , , 2569–2577.(68) Nicolas, J. P.; Smit, B. Molecular dynamics simulations of the surface tension of n-hexane, n-decane and n-hexadecane. Mol. Phys. , , 2471–2475.(69) P`amies, J. C.; McCabe, C.; Cummings, P. T.; Vega, L. F. Coexistence densities ofmethane and propane by canonical molecular dynamics and gibbs ensemble MonteCarlo simulations. Mol. Simul. , , 463–470.(70) Anderson, J. A.; Lorenz, C. D.; Travesset, A. General purpose molecular dynamicssimulations fully implemented on graphics processing units. J. Comput. Phys. , , 5342–5359.(71) Glaser, J.; Nguyen, T. D.; Anderson, J. A.; Lui, P.; Spiga, F.; Millan, J. A.;R8orse, D. C.; Glotzer, S. C. Strong scaling of general-purpose molecular dynamicssimulations on GPUs. Comput. Phys. Comm. , , 97–107.(72) LeBard, D. N.; Levine, B. G.; Mertmann, P.; Barr, S. A.; Jusuﬁ, A.; Sanders, S.;Klein, M. L.; Panagiotopoulos, A. Z. Self-assembly of coarse-grained ionic surfactantsaccelerated by graphics processing units. Soft Matter ,8