[PDF] Coarse-Grained Residue-Based Models of Disordered Protein Condensates: Utility and Limitations of Simple Charge Pattern Parameters

Abstract

Biomolecular condensates undergirded by phase separations of proteins and nucleic acids serve crucial biological functions. To gain physical insights into their genetic basis, we study how liquid-liquid phase separation (LLPS) of intrinsically disordered proteins (IDPs) depends on their sequence charge patterns using a continuum Langevin chain model wherein each amino acid residue is represented by a single bead. Charge patterns are characterized by the `blockiness' measure κ and the `sequence charge decoration' (SCD) parameter. Consistent with random phase approximation (RPA) theory and lattice simulations, LLPS propensity as characterized by critical temperature T ∗ cr increases with increasingly negative SCD for a set of sequences showing a positive correlation between κ and − SCD. Relative to RPA, the simulated sequence-dependent variation in T ∗ cr is often---though not always---smaller, whereas the simulated critical volume fractions are higher. However, for a set of sequences exhibiting an anti-correlation between κ and − SCD, the simulated T ∗ cr 's are quite insensitive to either parameters. Additionally, we find that blocky sequences that allow for strong electrostatic repulsion can lead to coexistence curves with upward concavity as stipulated by RPA, but the LLPS propensity of a strictly alternating charge sequence was likely overestimated by RPA and lattice models because interchain stabilization of this sequence requires spatial alignments that are difficult to achieve in real space. These results help delineate the utility and limitations of the charge pattern parameters and of RPA, pointing to further efforts necessary for rationalizing the newly observed subtleties.

Full PDF

11October 30, 2018

Coarse-Grained Residue-Based Models of DisorderedProtein Condensates: Utility and Limitations ofSimple Charge Pattern Parameters

Suman D AS , Alan A

MIN , Yi-Hsuan L IN , , and Hue Sun C

HAN , , ∗ Department of Biochemistry, University of Toronto, Toronto, Ontario M5S 1A8, Canada; Molecular Medicine, Hospital for Sick Children, Toronto, Ontario M5G 0A4, Canada Department of Molecular Genetics, University of Toronto,Toronto, Ontario M5S 1A8, Canada; ∗ Corresponding authorE-mail: [email protected]; Tel: (416)978-2697; Fax: (416)978-8548Mailing address:Department of Biochemistry, University of Toronto, Medical Sciences Building – 5th Fl.,1 King’s College Circle, Toronto, Ontario M5S 1A8, Canada. a r X i v : . [ q - b i o . B M ] O c t Abstract

Biomolecular condensates undergirded by phase separations of proteins and nucleic acidsserve crucial biological functions. To gain physical insights into their genetic basis, westudy how liquid-liquid phase separation (LLPS) of intrinsically disordered proteins (IDPs)depends on their sequence charge patterns using a continuum Langevin chain model whereineach amino acid residue is represented by a single bead. Charge patterns are characterizedby the “blockiness” measure κ and the “sequence charge decoration” (SCD) parameter.Consistent with random phase approximation (RPA) theory and lattice simulations, LLPSpropensity as characterized by critical temperature T ∗ cr increases with increasingly negativeSCD for a set of sequences showing a positive correlation between κ and − SCD. Relativeto RPA, the simulated sequence-dependent variation in T ∗ cr is often—though not always—smaller, whereas the simulated critical volume fractions are higher. However, for a set ofsequences exhibiting an anti-correlation between κ and − SCD, the simulated T ∗ cr ’s are quiteinsensitive to either parameters. Additionally, we ﬁnd that blocky sequences that allowfor strong electrostatic repulsion can lead to coexistence curves with upward concavityas stipulated by RPA, but the LLPS propensity of a strictly alternating charge sequencewas likely overestimated by RPA and lattice models because interchain stabilization ofthis sequence requires spatial alignments that are diﬃcult to achieve in real space. Theseresults help delineate the utility and limitations of the charge pattern parameters and ofRPA, pointing to further eﬀorts necessary for rationalizing the newly observed subtleties. Functional biomolecular condensates of proteins and nucleic acids—some of which arereferred to as membraneless organelles—have been garnering intense interest since the re-cent discoveries of liquid-like behaviors of germline P-granules in

Caenorhabditis elegans and observations of phase transitions from solution to condensed liquid and/or to gel statesin cell-free systems containing proteins with signiﬁcant conformational disorder. In hind-sight, the possibility that certain cellular compartments were condensed liquid droplets hasalready been raised more than a century ago when the protoplasm of echinoderm (e.g.star-ﬁsh and sea-urchin) eggs was seen as an emulsion with granules or microsomes as itsbasic components. Subsequently, in two studies nearly half a century apart, the nucleoluswas hypothesized to be a “coacervate”, a “separated phase out of a saturated solution” and, more generally, phase separation in the cytoplasm was proposed to be “the basis formicrocompartmentation”. Now, burgeoning investigative eﬀorts on biomolecular conden-sates in the past few years have yielded many advances (refs and references therein). Toname a few, phase separations of intrinsically disordered proteins (IDPs) or folded proteindomains connected by disordered linkers are critical in the formation and organization ofthe nucleolus (as anticipated seventy years earlier ), the nuclear pore complex , post-synaptic densities , P-granules , and stress granules. They are also responsible for theability of tardigrades (“water bears”) to survive desiccation and the synthesis of squidbeaks as well as byssuses for anchoring mussels onto sea rocks. More speculatively, thecompartmentalization aﬀorded by IDP phase separation might even be important in the ori-gin of life as envisioned in the Oparin theory and its modern derivatives. Becauseof the crucial roles of biomolecular condensates in physiological functions, their dysfunctioncan lead to diseases such as pathological protein ﬁbrillization and neurological disorders. Properties of IDPs and their phase separations are dependent, as physically expected,upon the amino acid sequences of the IDPs.

However, deciphering genetically encodedsequence eﬀects on biologically functional biomolecular condensates is diﬃcult in generalbecause the interactions within such condensates can be extremely complex, often involvingmany species of proteins and nucleic acids and the condensates are sometimes maintainedby non-equilibrium processes.

For instance, some crucial interactions in biomolec-ular condensates can be ATP-modulated as in stress granules , others can be tuned bypost-translational modiﬁcations as exempliﬁed by phosphorylations of Fused in Sarcoma(FUS). Biomolecular condensates are “active liquids” in this regard.

Moreover, somebiomolecular condensates are not entirely liquid-like but rather exhibit gel- or solid-likecharacters.

By comparison, experimental biophysical studies often focus, for tractabil-ity, on equilibrium properties of simple condensates consisting of only a few biomolecularcomponents. Nonetheless, although these constructs are highly simpliﬁed models of in vivo biomolecular condensates, knowledge gained from their study is extremely useful not onlyas a scientiﬁc stepping stone to understanding the workings of complex in vivo biomolecularcondensates but also as an engineering tool for designing bioinspired materials.

Currently, theoretical approaches to sequence-dependent biophysical properties ofbiomolecular condensates are only in their initial stages of development. These eﬀorts—which include analytical theories and explicit-chain simulations—have been focusing ongeneral principles and rationalization of experimental data on simple systems. Analyti-cal theories are an eﬃcient investigative tool despite their limited, approximate treatmentof structural and energetic details. For example, predictions of mean-ﬁeld Flory-Huggins(FH)-type theories are sensitive to IDP amino acid compositions but FH theoriesdo not distinguish diﬀerent IDP sequences sharing the same composition. Nevertheless,such theories can be very useful, as demonstrated by a recent formulation that rationalizeshow FUS phase behaviors depend on tyrosine and arginine compositions. By comparison,more energetic details of multiple-chain IDP interactions are captured by random phaseapproximation (RPA), which is an analytical formulation that oﬀers a rudimentary ac-count of sequence-dependent electrostatic eﬀects on IDP phase behaviors. Because itallows for the treatment of any arbitrary sequence of charges, RPA has proven useful inaccounting for the experimental eﬀects of sequence charge pattern on the phase proper-ties of RNA helicase Ddx4 (refs ). It is also instrumental in proposing a novel correlationbetween sequence-dependent single-chain properties and multiple-chain phase behaviors —which was recently veriﬁed by explicit-chain simulations —and in suggesting a new formof “fuzzy” molecular recognition based on charge pattern matching. In this connection,another recent approach that combines transfer matrix theory and simulation has also beenuseful in accounting for complex coacervation involving polypeptides with simple repeatingsequence charge patterns.

Building on these advances, further work will be needed todevelop theories that can account for sequence-dependent non-electrostatic eﬀects, includ-ing hydrophobicity, cation- π interactions—which play signiﬁcant roles in functional anddisease-causing IDP interactions and in the formation of biomolecular condensates —aswell as aromatic and non-aromatic π – π interactions which are likely of importance inthe assembly of biomolecular condensates. Explicit-chain models and analytical theories are complementary. Compared to analyti-cal theories, explicit-chain simulations of IDP phase separation are computationally expen-sive because they require tracking the conﬁgurations of a multiple-chain model system thatis suﬃciently large to represent phase-separated states. Yet explicit-chain simulations arenecessary for a realistic representation of chain geometry and thus indispensable also forevaluating the approximations invoked by analytical theories.

Phase separation of IDPand/or folded protein domains connected by disordered linkers have been simulated usinghighly coarse-grained models consisting of basic units each designed to represent groups ofamino acid residues.

These constructs have yielded physical insights into the phase be-haviors of a four-component system, for example. Also utilized recently for explicit-chainmodeling of biomolecular condensates are coarse-grained approaches that capture morestructural and energetic details by representing each amino acid residue of an IDP as a sin-gle bead on a chain. Because we are interested in biomolecular condensates in whichthe protein chains are signiﬁcantly disordered, analytical theories and simulationtechniques developed for the phase separation of folded proteins (e.g. γ -crystallin and lysozyme ) are not directly applicable. Our group has previously employed latticemodels to study sequence-dependent electrostatic eﬀects on IDP phase separation. To as-sess the extent to which predictions from these models are aﬀected by lattice artifacts andto broaden our eﬀort to model biomolecular condensates in general, here we apply morerealistic coarse-grained models wherein IDP chains are conﬁgured in the continuum.

Coarse-grained explicit-chain models are well-suited to address general physical princi-ples. The rapidly expanding experimental eﬀorts have provided an increasing rich set ofdata on overall physical properties of biomolecular condensates that awaits theoretical anal-ysis. For instance, although solutions with temperature-independent eﬀective solute-soluteinteractions are expected to phase separate when temperature is reduced below a certainupper critical solution temperature (UCST)—in which case phase separation propensityat a given temperature increases with increasing critical temperature T ∗ cr = UCST, somebiomolecular condensates are formed at raised temperatures (i.e., they possess a lower criti-cal solution temperature, LCST)—in which case phase separation propensity at a given tem-perature decreases with increasing T ∗ cr = LCST. Examples of the latter include elastin, the Alzheimer-disease-related tau protein, and the Poly(A)-binding protein Pab1 associ-ated with stress granules in yeast. Recent experiments on elastin indicate that formationof biomolecular condensates can also be dependent upon hydrostatic pressure. As hasbeen suggested, these phenomena may be accounted for, at least semi-quantitatively,by temperature and pressure -dependent sidechain and backbone IDP interactions. Building on our recent lattice simulation, we focus here on sequence-dependentelectrostatic eﬀects on IDP phase separation. Previous studies by analytical theories and explicit-chain lattice simulations of IDPs with diﬀerent charge patterns suggest thattheir propensities to phase separate are well correlated with two parameters for charac-terizing sequence charge pattern: the intuitive κ parameter for “blockiness” of the chargearrangement along a sequence and the “sequence charge decoration” SCD parameterthat arose from a theory for the conformational dimensions of polyampholytes. If suchparameters (and even simpler properties such as the net charge of a sequence) can predictcertain aspects of IDP phase separation, they may shed light on the relevant “holistic”physical properties underpinning certain shared biological functions among IDP sequencesthat are otherwise highly diverse on a residue-by-residue basis.

These parameterscould be useful for designing artiﬁcial protein polymers as well.

Remarkably, althoughboth κ and SCD originated from studies of single-chain IDP properties, they appear tocapture also the propensities of multiple IDP chains to phase separate. In view of theprospective broad utility of this putative relationship, its generality deserves closer scrutiny.

With the above consideration in mind, the present study compares polyampholytes phaseproperties predicted by RPA theory against those simulated by explicit-chain models, andassesses the ability of κ and SCD to capture the theoretical/simulated trends. The inter-play between the eﬀects of charge-dependent electrostatic and charge-independent Lennard-Jones-type interactions on polyampholyte phase behaviors is also explored.Insofar as explicit-chain modeling of biomolecular systems is concerned, atomic mod-els with detailed structural and energetic representations and coarse-grained models arecomplementary when both approaches are viable for the system in question. Despite theirrelative lack of structural and energetic details—and in some cases precisely because of thislack of details—coarse-grained models have contributed signiﬁcantly to theoretical advancessince they are computationally eﬃcient tools for conceptual development and for discoveryof universality across a large class of seemingly unrelated phenomena. For instance, earlyexact enumerations of conformational statistics of lattice polymers was instrumental inthe subsequent fundamental development of scaling and renormalization group the-ories in polymer physics. Other examples include lattice investigations of protein foldingkinetics and DNA topology that led to more sophisticated models conﬁrming insightsoriginally gained from earlier lattice studies. Lattice models are a powerful tool for thestudy of homopolymer phase separation as well, although their applicability to long het-eropolymeric chains might be limited as has been noted. Moving beyond the conﬁnesof lattices, here we consider model chains conﬁgured in continuum space.The determination of phase diagrams of IDP liquid-liquid phase separation (LLPS) iscomputationally intensive. Currently, all-atom explicit-water molecular dynamics is notfeasible for this task. Even a recent state-of-the-art molecular dynamics study of the liquidstructure of elastin that clocked a total simulated time of 165 µ s could only model a dropletof twenty seven 35-residue elastin-like peptides and did not provide a phase diagram. Besides issues of computational eﬃciency, common molecular dynamics force ﬁelds are wellknown to be problematic for IDPs.

Developing a force ﬁeld that is suitable for bothIDPs and globular proteins has been a major ongoing challenge.

In this context, we adapt the coarse-grained model of Dignon et al., which in turnis partly based upon simulation algorithms developed for vapor-liquid transitions.

Thisapproach is promising because it is computationally eﬃcient and has already providedqualitative and semi-quantitative account of experimental data, a notable example of whichis a rationalization of the experimentally observed variation in phase behavior amongphosphomimetic mutants of FUS. In contrast to Monte Carlo sampling of lattice models,this modeling setup can provide dynamic information readily. An analysis of mean squareddisplacements has indicated that the condensed liquid phases in this coarse-grained modelcan indeed be liquid-like rather than solid-like aggregates.While the goal of the present work is to lay the necessary foundation for extensivecomparison between theory and experiment, our primary focus here is on comparing explicit-chain results against analytical theories and assessing the eﬀectiveness of sequence chargepattern parameters κ and SCD as predictors for IDP LLPS. In view of the rationalizationsaﬀorded by analytical theory for experiment and the potential utility of analytical theoriesand charge pattern parameters for materials design, it is important to ascertain the partsplayed by the physical assumptions and mathematical approximations in the success orfailure of these analytical formulations. For this purpose, we deem it best to ﬁrst considersimple “toy-model” sequences for the conceptual clarity they oﬀer. One advantage of usingsimple coarse-grained models is that the general principles gleaned from our exercise mayhave applications beyond IDPs, including, e.g., protein mimetic peptoids. As detailed in subsequent sections of this article, our investigation indicates thatalthough both κ and − SCD correlate positively with RPA-predicted LLPS propensities forpolyampholytes having zero net charge but possessing diﬀerent sequence charge patterns,the corresponding correlations with LLPS propensities simulated by coarse-grained modelsare less general. These ﬁndings help delineate the utility/limitation of RPA as well as that ofthe sequence charge parameters κ and SCD as LLPS predictors. Comparisons of our resultsfrom lattice and continuum explicit-model simulations suggest further that the spatial orderimposed by lattice models would likely result in overestimated LLPS propensities for IDPconﬁgured in real space. Ramiﬁcations of these observations for ongoing development oftheoretical and computational techniques for biomolecular condensates are discussed below. Similar to ref , we adopt the recent algorithm in ref for simulating vapor-liquid equilib-rium of ﬂexible Lennard-Jones (LJ) chains to study IDP LLPS. The interactions between LJspheres are now identiﬁed as eﬀective interactions (potentials of mean force) between aminoacid residues in a liquid solvent. Consequently, the vapor and liquid phases in the origi-nal formulation correspond, respectively, to the dilute and condensed liquid phases of anIDP solution. Molecular dynamic simulations are performed with the HOOMD-blue simulation package with IDP chains (polymers) conﬁgured in a cubic box with periodicboundary conditions. The long-spatial-range electrostatic interaction among the chargedresidues (monomers) is treated by PPPM method implemented in the package .Using the notation in our previous lattice study, for any two diﬀerent residues labeled µ, i and ν, j ( µ, ν = 1 , , . . . , n label the IDP chains where n is the total number of chainsin the simulation, i, j = 1 , , . . . , N label the N residues in each chain) with charges σ µi , σ νj in units of elementary electronic charge e , their electrostatic interaction is given by( U el ) µi,νj = σ µi σ νj e π(cid:15) (cid:15) r r µi,νj , (1)where (cid:15) is vacuum permittivity, (cid:15) r is relative permittivity (dielectric constant), and r µi,νj is the distance separating the two residues. Unlike refs , the electrostatic interactionsare not screened in the present study. (Note that the expression for ( U el ) µi,νj in ref isin units of k B T where k B is Boltzmann constant and T is absolute temperature). Besideselectrostatics, all non-bonded residue pairs also interact via the LJ potential( U LJ ) µi,νj = 4 (cid:15) (cid:34)(cid:18) ar µi,νj (cid:19) − (cid:18) ar µi,νj (cid:19) (cid:35) , (2)where (cid:15) is the LJ well depth (not to be confused with the permittivities) and a speciﬁesthe LJ interaction range. The electrostatic and LJ interactions in Eqs. (1) and (2) applyto all intra- and interchain residue pairs that are not sequential neighbors along a chain,i.e., for all µ, i and ν, j without exception when µ (cid:54) = ν and for all µ, i and µ, j satisfying | i − j | > µ = ν . For simplicity and to facilitate a more direct comparison with ourprevious theoretical and lattice studies, we use the same a for the two types of residuesconsidered below (unlike ref which uses diﬀerent a values for diﬀerent residue types).As suggested by previous simulations of phase coexistance, we expect a LJ cutoﬀdistance of 6 a is adequate and thus it is adopted for our simulations. For computationaleﬃciency, the same cutoﬀ is applied also to the electrostatic interaction in Eq. (1). We set (cid:15) = e / (4 π(cid:15) (cid:15) r a ) and use (cid:15) to deﬁne the energy scale throughout the present study, includingcases when the LJ potential is reduced to ( U LJ ) / T ∗ ≡ k B T /(cid:15) . (Thus T ∗ can be converted to T for any givenrelative permittivity (cid:15) r , although the present theoretical analysis largely does not focus onspeciﬁc (cid:15) r values.) The strong interactions maintaining chain connectivity are modeled bya harmonic potential between successive residues along a chain: U bond ( r µi,µi +1 ) = K bond ( r µi,µi +1 − a ) / K bond = 75 , (cid:15)/a is similar to corresponding values used forbond-length energies in the TraPPE force ﬁeld. Kinetic properties of the simulatedsystem is modeled by Langevin dynamics using the velocity-Verlet algorithm with a timestepof 0 . τ , where τ ≡ (cid:112) ma /(cid:15) and m is the mass of a residue (for simplicity all residues areassumed to have the same mass). As in ref , we use a weakly coupled Langevin thermostatwith a friction factor of 0 . m/τ (ref ).We begin each simulation by randomly placing n = 500 IDP chains in a periodic cubicsimulation box of length 70 a . Subsequently, the chain conﬁgurations are energy-minimizedand then heated to a high T ∗ = 4 . , τ . This is followed by a compression of theperiodic simulation box (by isotropic rescaling of all chain coordinates) at a constant rateunder the same high T ∗ = 4 . , τ to arrive at a much smaller periodic cubic boxof length 33 a , resulting in a ﬁnal IDP density ρ ≈ . m/a . The simulation box is thenexpanded along the direction (labeled as z ) of one of the three axes of the box by a factorof eight with the temperature kept at a low T ∗ = 1 .

0, resulting in a simulation box withdimensions 33 a × a × a containing a concentration of chain population (a “slab”)somewhere along the z -axis whereas chain population is zero or extremely sparse for otherparts of the elongated simulation box. Any conformation that is originally wrapped in the z -direction in the compressed 33 a × a × a box because of the periodic boundary conditionsis unwrapped in this expansion process by placing the chain conformation entirely on theside of the “slab” with larger z values (see Fig. 1 for a visualization of this procedure).After this initial preparation, the periodic boundary conditions along the z -axis arere-instated. The temperature of the expanded simulation box is changed from T ∗ = 1 . , τ . The production run is thencarried out for 100 , τ during which snapshots of the chain conﬁgurations are saved every10 τ for detailed analyses. The position of the simulation box is continuously adjusted suchthat the center of mass of the chains is always at z = 0. Density distributions along the z axis are determined by averaging subpopulations of 264 bins of equal width (= a ) over thesimulated trajectories. Polyampholytes densities are reported in units of m/a . It followsthat the numerical value of ρ is equal to the average number of residues (monomers) in avolume of a . An example of the results from such a calculation is given in Fig. 1. Following Das and Pappu, the blockiness parameter κ is deﬁned to quantify the devi-ations of the charge asymmetries of local sequence segments from the overall charge asym-metry of a given sequence. For a sequence segment of length g that starts at monomer k (on any one of the n identical chains labeled by µ ), the charge asymmetry is de-ﬁned as s ( g ; k ) = [ f + ( g ; k ) − f − ( g ; k )] / [ f + ( g ; k ) + f − ( g ; k )] where f + ( g ; k ) and f − ( g ; k )are the ratios, respectively, of positively and negatively charged monomers (residues)0among the g monomers of the sequence segment; i.e., f ± = (cid:80) k + g − i = k ( | σ µi | ± σ µi ) / g where the summation is over the sequence segment that starts at monomer k and endsat monomer k + g −

1. It follows that the overall charge asymmetry for the entire se-quence with N monomers is s ( N ; 1). The average deviation of local charge asymmetryfrom the overall charge asymmetry for all g -monomer segments (sliding windows) is givenby δ g ≡ (cid:80) N − g +1 k =1 [ s ( g ; k ) − s ( N ; 1)] / ( N − g + 1). A g -speciﬁc quantity κ g ∈ [0 ,

1] is thendeﬁned as κ g ≡ δ g / max( δ g ) where max( δ g ) is the maximum δ g value of the set of sequenceswith a given composition that is being considered. In the present case, max( δ g ) corre-sponds to the δ g of the fully charged N -monomer diblock polyampholyte. As in ref , the κ we have used for the present work, which takes the form κ ≡ δ + δ max( δ ) + max( δ ) , (4)is an average over results for local segment lengths g = 5 and g = 6. Note that Eq. (4)diﬀers slightly from the κ = ( κ + κ ) / but the diﬀerence is practicallynegligible ( <

1% for low- κ sequences and < .

01% for large- κ sequences).Following Sawle and Ghosh, SCD ≡ N (cid:88) i =2 i − (cid:88) j =1 σ µi σ µj (cid:112) i − j/N (5)is the weighted summation over all pairs of charges along a given sequence. We study seven fully charged polyampholyte sequences of length N = 50. The sequenceshave equal number of positive and negative residues (charge σ = ± we designate the positive and negativeresidues as “lysine” (K) and “glutamic acid” (E), respectively. The sequences are referredto as “KE” sequences (Fig. 2). Sequences labeled as sv1, sv15, and sv30 were originallyintroduced in ref and have been studied previously by theory and explicit-chainsimulations. These sequences are chosen again for the present study because theyspan a wide range of values for the sequence charge pattern parameters κ and SCD. Toprovide a context for our simulation study, we have examined the distributions of SCD and κ among all possible KE sequences with zero net charge by using simple Monte Carlo as wellas Wang-Landau sampling. The results in Fig. 3a,b indicate that the distributionsare concentrated in relatively small κ and − SCD values. Sequences with large κ or large − SCD values are extremely rare. A reasonable positive correlation exists between κ and1 − SCD; but there is also considerable scatter (Fig. 3c, blue circles), underscoring that thetwo parameters address similar as well as signiﬁcantly diﬀerent sequence properties. Fig. 3cindicates that sv1, sv15, and sv30 lie in a region where κ and − SCD are well correlated.RPA theory for sv1, sv15, sv30 and 27 other sv sequences stipulates that LLPSpropensity is well correlated with κ and − SCD. This prediction is supported to a limiteddegree by explicit-chain simulation. In view of these ﬁndings, it would be instructive toprobe the eﬀectiveness of these charge pattern parameters as LLPS predictors by extendingour analysis to outlier sequences that do not exhibit a positive correlation between κ and − SCD. Because such sequences likely reside in sparely populated regions of sequence space,we use a biased sampling procedure to locate them by maximizing the scoring function E ≡ A [ − SCD / ( − SCD) max − κ ] + h SCD [ − SCD / ( − SCD) max ] + h κ κ (6)for KE sequences, where A , h SCD , and h κ are tunable parameters. When E is maximized,the ﬁrst term in Eq. (6) maximizes the diﬀerence between a rescaled − SCD and κ ( − SCD / ( − SCD) max , κ ∈ [0 , − SCD or a high κ value is preferred. Starting with an initial KE sequence, an exchangebetween a randomly chosen pair of K and E is attempted at each Monte Carlo step. Theattempted exchange is accepted if it results in an increase in E . Otherwise it is rejected.Partially optimized sequences are generated in this manner by 1,000 Monte Carlo steps.By tuning the A , h SCD , and h κ parameters, we have generated four sequences—labeled byas1, as2, as3, and as4 (Fig. 2)—that collectively exhibit an anti-correlation trend between κ and − SCD (Fig. 3c, orange circles). The κ and SCD values for these sequences and thosefor the sv1, sv15, and sv30 sequences are summarized in Table 1. For reasons to be expounded below, we consider three diﬀerent combinations of theelectrostatic [ U el in Eq. (1)] and LJ [ U LJ in Eq. (2)] potentials as the total residue-residueinteraction energy U (Fig. 4): (i) simple sum of the two terms, viz., U = U el + U LJ (Fig. 4a);(ii) sum of the electrostatics term and a LJ term reduced to 1/3 of its strength, viz., U = U el + (1 / U LJ (Fig. 4b); and (iii) sum of the electrostatics term and a LJ term thatapplies only to r ≤ a , where r is the residue-residue distance, viz., U = U el + U LJ for r ≤ a and U = U el for r > a . We are interested in various combinations of U el and U LJ becausethey bear on one of the formulations used in a general explicit-chain simulation approach tostudy LLPS of IDPs. Here, the “with LJ” model (Fig. 4a) represents a somewhat extreme2case in which the LJ attraction is suﬃciently strong such that the total interaction remainsattractive when the two like charges are in close proximity (for r ≈ / a ). To address therole of the background LJ interactions on LLPS, the “with 1/3 LJ” model (Fig. 4b) reducesLJ attraction but the overall repulsion between like charges is still considerably weakerthan the attraction between opposite charges. In contrast, the “with hard-core repulsion”model (Fig. 4c) retains only the repulsive part of the LJ potential up to the residue-residueseparation at which U LJ = 0 (when r = a ), such that the strength of repulsion betweenlike charges is equal to that of attraction between opposite charges at r = a . This modelrepresents an extreme case in which attractive van der Waals interactions play no role inLLPS. Notably, the symmetry between repulsive and attractive interaction strengths andthe treatment of hard-core excluded volume aﬀorded by this model resemble those in RPAtheory (at least conceptually) and in explicit-chain lattice simulations. It follows thatthe model potential in Fig. 4c is useful for assessing RPA and lattice results.The phase diagrams for sequences sv1, sv15, and sv30 are calculated using both the “withLJ” and “with 1/3 LJ” models (Fig. 5). All simulated data points in the phase diagramsin this ﬁgure and subsequent ﬁgures are obtained directly from the density distributions ofexpanded simulation boxes except the critical points (at the top of each of the coexistencecurves) are estimated using the scaling relation speciﬁed by Silmore et al. Representa-tive chain conﬁgurations above and below the critical temperature are provided by thesnapshots in Fig. 5. As expected, the model chains exist in a single phase above the criti-cal temperature with essentially uniform polyampholyte density throughout the simulationbox (Fig. 5, bottom left ). In contrast, a condensed phase (well-deﬁned localized slab in thesimulation box) persists below the critical temperature (Fig. 5, bottom right ). Consistentwith RPA theory and lattice simulations, the critical temperatures [ T ∗ cr (sv1), T ∗ cr (sv15),and T ∗ cr (sv30)] of the three sequences exhibit a clear increasing trend with increasing κ (= 0 . . . − SCD (= 0 . . .

84, respectively, see Table 1) for both the “with LJ” (Fig. 5a) and “with 1/3 LJ”(Fig. 5b) models. More speciﬁcally, T ∗ cr (sv1), T ∗ cr (sv15), and T ∗ cr (sv30) equals, respectively,3 .

52, 3 .

86, and 4 .

97 in Fig. 5a and 1 .

20, 1 .

52, and 3 .

44 in Fig. 5b.We expect LLPS propensities to be generally higher in the “with LJ” model (Fig. 5a)than in the “with 1/3 LJ” model (Fig. 5b) because the former model provides a strongeroverall residue-residue attraction. This expectation is conﬁrmed by the results in Fig. 5showing that the T ∗ cr ’s in Fig. 5a are substantially higher than the T ∗ cr ’s for the correspondingsequences in Fig. 5b. However, the diﬀerences in LLPS properties among the three sequencesare more pronounced in the “with 1/3 LJ” model than in the “with LJ” model. Whereasthe diﬀerence T ∗ cr (sv15) − T ∗ cr (sv1) in the “with LJ” model (= 0 .

34) is nearly equal to thatin the “with 1/3 LJ” model (= 0 . T ∗ cr (sv30) − T ∗ cr (sv15) is substantialsmaller in the “with LJ” model (= 1 .

11) than in the “with 1/3 LJ” model (= 1 . T ∗ cr ’s of diﬀerent sequences are compared: T ∗ cr (sv15) /T ∗ cr (sv1) = 1 .

097 for the “with LJ” model, which is smaller than the correspondingratio of 1 .

267 for the “with 1/3 LJ” model; and T ∗ cr (sv30) /T ∗ cr (sv15) = 1 .

288 for the “withLJ” model, which is substantially smaller than the corresponding ratio of 2 .

263 for the“with 1/3 LJ” model. These results illustrate that variations in LLPS propensity inducedby diﬀerent sequence charge patterns can be partially suppressed by background residue-residue attraction that pushes the chain molecules to behave more like homopolymers.Interestingly, the coexistence curve for sv30 in Fig. 5b exhibits clearly an inﬂection pointon the condensed (right-hand) side such that part of the coexistence curve on this side isconcave upward. A hint of upward concavity exists also—though barely discernible–forthe coexistence curve for sv30 in Fig. 5a as well as the coexistence curves for sv15 andsv1 in Fig. 5b. In contrast, the entire coexistence curves for sv15 and sv1 in Fig. 5a isconvex upward. This observation from explicit-chain simulations are consistent with RPAtheory of polyampholytes with zero or near-zero net charge.

Indeed, a systematic RPAstudy of 30 KE sequences indicates that upward concavity of the condensed side of thecoexistence curve decreases with decreasing − SCD and decreasing κ (Figure 1a of ref ).Whereas the RPA-predicted concavity is prominent for sv30, it is barely discernible forsv15 and sv1 (Figure 10 of ref ). This upward concavity of coexistence curves is knownto be related to the long spatial range of electrostatic interactions and has been predictedby RPA theory for polyelectrolytes. Apparently—and not inconsistent with intuition,LLPS properties of polyampholytes with more blocky sequence charge patterns are in somerespect akin to those of polyelectrolytes. Comparison of the coexistence curves in Fig. 5aagainst those in Fig. 5b suggests further that upward concavity of the coexistence curveis likely associated with the presence of strong long-range repulsive interactions in thesystem as well. In this regard, it is instructive to note that none of the coexistence curvessimulated recently in refs for various intrinsically disordered proteins or protein regionsexhibit upward concavity. The only coexistence curve in these references that shows a clearupward-concave trend is the one for a model folded helicase domain in Figure S14 of ref . κ and SCD are good predictors ofLLPS propensity for some but not all polyampholytes As a group, the as1–4 sequences exhibits anti-correlation between κ and − SCD. Incontrast to sequences sv1, sv15, and sv30 in Fig. 5 with T ∗ cr increasing with both increasing κ and increasing − SCD, the phase diagrams for sequences as1, as2, as3, and as4 in Fig. 6are quite similar despite their very diverse κ values ranging from 0 . . T ∗ cr ’s are 2.25, 2.31, 2.28, and 2.41, respectively. Although T ∗ cr generally increases with κ (except for as2 and as3), the increase of T ∗ cr with respect to κ T ∗ cr (as4) − T ∗ cr (as1) = 0 .

16 and a ratio of T ∗ cr (as4) /T ∗ cr (as1) = 1 .

071 are registered for an increase in κ of 0 . κ is much smaller at 0 . T ∗ cr diﬀerence and ratio simulated using the same “with 1/3 LJ” model (Fig. 5b), 0 .

32 and1 .

267 respectively, are much larger than those between as1 and as4 in Fig. 6.Because of the anti-correlation between κ and − SCD among sequences as1–4 (Fig. 3c),the T ∗ cr ’s of the as1–4 sequences in Fig. 6 anti-correlate with their − SCD values—rather thancorrelating with − SCD as in the case of the sv1, sv15, and sv30 sequences. Speciﬁcally,the increase of the critical temperature from as1 to as4, T ∗ cr (as4) − T ∗ cr (as1) = 0 .

16, isaccompanied by a decrease in the value of − SCD from 12 .

79 for as1 to 6 .

11 for as4 (adiﬀerence of 6 . T ∗ cr with respect to SCD is onlyabout a third of that between sequences sv1 and sv15 and is in the opposite direction (0 . T ∗ cr from sv1 to sv15 is concomitant with a − SCD increase of 3 . κ and SCDare sensitive predictors of the LLPS of a certain class of polyampholytes (such as the sv1,sv15, and sv30 sequences) but not others (such as the as1, as2, as3, and as4 sequences).This limitation of the κ and SCD parameters is not entirely surprising in view of theirorigins as intuitive and theoretial predictors of single-chain conformational dimensionsof polyampholytes, not as predictors for LLPS. By construction, κ quantiﬁes the degreeto which the sequence charge distribution is locally blocky, whereas SCD addressescomplementarily sequence-nonlocal eﬀects from charges that are separated by a longsegment of the chain. For the original set of 30 polyampholytes introduced in ref (whichincludes sv1, sv15, and sv30), SCD correlates better with explicit-chain simulated radiusof gyration and RPA-predicted T ∗ cr ’s. Now, the T ∗ cr ’s weak positive correlation with κ and weak negative correlation with − SCD for the as1–4 sequences in Fig. 6 suggest thatthe eﬀect of local charge pattern on LLPS—which is a multiple-chain phenomenon—maybe stronger than that of nonlocal charge pattern. Nonetheless, the fact that κ as a LLPSpredictor is much less sensitive when it anti-correlates with − SCD suggests at the sametime that nonlocal charge pattern eﬀect does have a non-negligible role in LLPS. We willreturn to this issue below when we present an extensive study of these sequence chargeparameters in the context of RPA theory in Sec. 4.5.

To examine further the eﬀect of background LJ interactions on the sensitivity of LLPSto sequence charge pattern, simulations of sv1, sv15, and sv30 are conducted using the“with hard-core repulsion” model potential in Fig. 4c. Results from this model (Fig. 7)5should be more directly comparable with those from pure RPA theory (without any Flory χ parameter ) and lattice simulations because there is no non-electrostatic attraction inpure RPA theory and our recent explicit-chain lattice model. Aside from chain connectivityand lattice constraints, the only non-electrostatic interactions in those formulations are excluded-volume repulsions. Despite the sharp repulsive forces entailed by this modelpotential, no erratic dynamics was observed in our Langevin simulations. Nonetheless, itwould be instructive in future investigations to assess more broadly the eﬀects of strongintra- and interchain repulsion on phase properties by using Monte Carlo sampling.Figures 7a,b show the equilibrium density distributions simulated at an extremely lowtemperature of T ∗ = 0 .

001 for sequences sv1 and sv15. This temperature is approachingthe lowest that can be practically simulated in the current model, because it is close to theminimum temperature ﬂuctuation that can be maintained by the model thermostat. For ex-ample, we have attempted to set the thermostat to T ∗ = 0 . T ∗ = 0 . namely a localized,well-deﬁned slab of essentially uniform density (Fig. 1, right ). Because a temperature aslow as T ∗ = 0 .

001 is very unlikely to be physically realizable for a liquid aqueous solution( T ∗ = 0 .

001 corresponds to T ≈ . (cid:15) r = 80 and T ≈

44 K for (cid:15) r = 1), we mayconclude from Fig. 7a,b that for practical purposes sv1 and sv15 do not undergo LLPS inaqueous solutions in the absence of substantial non-electrostatic attractive interactions.In contrast, a clear signature of phase separation is indicated for sequence sv30 at asuﬃciently low temperature of T ∗ = 0 . T ∗ cr = 1 .

65 for sv30 here is lower than the T ∗ cr values of4 .

97 and 3 .

44 for sv30 in Fig. 5a and Fig. 5b. The upward concavity of the condensed sideof the coexistence curve for sv30 is remarkably more prominent in Fig. 7d than in Fig. 5aand Fig. 5b, buttressing our contention above that this hallmark feature is closely relatedto the presence of strong repulsive electrostatic interactions in the system.The dramatic diﬀerences in LLPS propensity among the three systems studied in Fig. 7are illustrated by two extreme cases of a particular energetically favorable conﬁgurationfor a pair of sv1 chains (Fig. 8a) and one for a pair of sv30 chains (Fig. 8b). In theseconﬁgurations, inter-chain distances between contacting beads are constant at r = a and thus repulsive LJ energies do not contribute to the total interaction energies plottedin Fig. 8c, which is given by (cid:80) i,j ; r µi,νj ≤ r max ( U el ) µi,νj , where µ, ν are the labels for thetwo chains in each pair. Figure 8c shows that the sv30 pair is energetically much more6favorable than the sv1 pair. At the large cutoﬀ limit ( r max → ∞ ), the interaction energyfor the sv1 pair limits to − . (cid:15) , whereas that for the sv30 pair limits to − . (cid:15) . Thisdiﬀerence helps rationalize the lack of LLPS for sv1 and the possibility of LLPS forsv30. The strictly alternating charge pattern of sv1 leads to a very weak net favorableinteraction between a sv1 pair even when the pair is in the highly special—and thusunlikely—conﬁguration in Fig. 8a. This is because of numerous partial cancellations ofattractions between a pair of opposite charges and repulsions between a pair of like chargessince such pairs are positioned next to each other. The weakness of the net favorableinter-chain interaction means that the inter-chain attraction in a highly constrainedconﬁguration that can readily be overwhelmed by increased conﬁgurational entropy in anensemble of more open chains. By comparison, for sv30, because of the much strongernet favorable inter-chain interaction, a condensed phase can ensue at a suﬃciently lowtemperauture when the free-energy eﬀect of conﬁgurational entropy is relatively diminished. To gain further insight into low-temperature LLPS properties of sv1, a snapshot of sv1conﬁgurations simulated at T ∗ = 0 .

001 is shown in Fig. 9. The chains are loosely associatedbut they do not coalesce into a droplet or a slab in the simulation box. Even the more denselypopulated region of the simulation box contains region of substantial pure solvent volumes(solvent-ﬁlled cavities or “voids” in the model) with no sv1 chains, indicating that theassociated state has a very weak eﬀective surface tension and is not liquid-like. The snapshotshows that some individual chain conformations are elongated, presumably to achieve morefavorable inter-chain contacts by near parallel alignment (similar to Fig. 8a), but othersappear more globular (Fig. 9, bottom ). The geometric/conﬁgurational diﬀerence betweenthe type of associated states in Fig. 9 and unambiguously phase-separated condensed phasessuch as the one depicted in Fig. 5 ( bottom right ) may be quantiﬁed by the analysis of cavitydistributions in Fig. 10, which shows by two diﬀerent rudimentary measures of cavity sizethat there are substantially more large solvent-ﬁlled cavities in the peculiar associated statein Fig. 9 than in a condensed phase that has clearly undergone phase separation.In contrast to the present continuum simulation results, both sv1 and sv15 were ob-served to coalesce into a condensed phase in our previous explicit-chain lattice simulation (Fig. 11). Thus, by comparing explicit-chain simulation results from the present continuummodel against those from our previous lattice model, it is clear that the spatial orderimposed by the lattice can have a very signiﬁcant eﬀect in favoring phase separation inlattice model systems. Lattice constraints represent a signiﬁcant restriction on conﬁgura-tional freedom, allowing opposite charges along polyampholytes to align more optimally.7This eﬀect is illustrated by the snaphots for condensed phases in Fig. 11. The aboveobservation implies that lattice models of phase separation can drastically overstimatephase separation propensity in real space. However, in some applications, it may be arguedthat the lattice order can serve to mimic certain physically realistic local conﬁgurationalorder—such as that induced by hydrogen bonding in protein secondary structure—that isnot taken into account in a coarse-grained continuum chain model. Chains conﬁguredon lattices may also capture certain eﬀects of steric constraints such as those embodiedin the tube model of proteins (see footnote 2 on p. S309 of ref ). The degree towhich these subtle ramiﬁcations of lattice features can be exploited in the study of IDPLLPS remains to be explored. Taken together, these considerations indicate that latticemodels can be useful in exploring general principles (Sec. 2) and deserve further atten-tion in future studies; but their predictions should always be interpreted with extra caution.

We utilize the simulated phase properties of the several polyampholyte sequences com-puted using diﬀerent model potentials to assess predictions oﬀered by RPA theory. To setthe stage, we ﬁrst establish a broader context of RPA predictions than is currently available.Applying the salt-free RPA formulation for IDP LLPS that we adapted and detailed recently, we numerically calculate the critical temperature T ∗ cr and critical volume fraction φ cr of all 10,000 randomly sampled sequences in Fig. 3 and examine their relationship withthe sequence charge parameters κ and − SCD (Fig. 12). Consistent with previous observa-tions based on more limited datasets,

RPA-predicted T ∗ cr of polyampholytes with zeronet charge exhibits a very good correlation with − SCD (tight scatter in Fig. 12a) but a lesserthough still substantial correlation with κ (broader scatter in Fig. 12b). The RPA-predictedspread of the φ cr versus − SCD scatter for the same set of polyampholytes (Fig. 12c) is alsonarrower than the corresponding spread of the φ cr versus κ scatter (Fig. 12d); but this dif-ference in scatter between − SCD and κ is not as pronounced as the corresponding diﬀerencein the scatter for T ∗ cr (Fig. 12a,b).The RPA-predicted dependence of T ∗ cr ’s and φ cr ’s of the sv1, sv15, sv30 sequences on − SCD and κ (red squares in Fig. 12) is well within the general, most probable trend expectedfrom the 10,000 randomly sampled sequences (blue circles Fig. 12). However, the as1, as2,as3, and as4 sequences (orange circles in Fig. 12) appear to be outliers. These sequences’deviation from the most probable trend is mild for the T ∗ cr versus − SCD (Fig. 12a), the φ cr versus − SCD (Fig. 12c), and the φ cr versus κ (Fig. 12d) scatter plots, but is severe for the T ∗ cr versus κ scatter plot (Fig. 12b). It is clear from Fig. 12b that the sign of correlationof the T ∗ cr ’s of the as1, as2, as3, and as4 sequences is opposite to the overall trend for the810,000 randomly sampled sequences (Fig. 12b).To compare RPA predictions with explicit-chain simulation results, we ﬁrst summarizethe simulation data, by themselves, in Fig. 13, which is the simulation equivalent of thetheoretical data in Fig. 12a,b. It provides the dependence of simulated T ∗ cr on the twosequence charge pattern parameters. Figure 13 recapitulates the positive correlation of thesimulated T ∗ cr ’s of the sv1, sv15, and sv30 sequences with − SCD and κ (squares in Fig. 13).The trends for − SCD and κ are quite similar. However, in relative terms, the T ∗ cr ’s ofthe as1, as2, as3, as4 sequences are almost independent of either − SCD and κ (circles inFig. 13). As noted above, the correlation of the T ∗ cr of these four sequences as a set is slightlynegative with − SCD and only slightly positive with κ . Only one data point is available ineach panel of Fig. 13 for simulated T ∗ cr in the “with hard-core repulsion” model (diamond forsv30) because sv1 and sv15 fail to phase separate unequivocally in this model (see above).The T ∗ cr of sv30 in this model is similar to that of the less-blocky sv15 sequence in the latticemodel (red-ﬁlled black squares). As stated previously, no simulated T ∗ cr is available for sv30in our recent lattice model because the favorable interactions in sv30 were too strong foreﬃcient equilibration in that model. We now contrast our simulation data with theoretical predictions. Depending on the sim-ulation conditions, diﬀerent matching theoretical formulations are used for the comparison:(i) Pure RPA theory for electrostatic and excluded-volume interactions only, as described inref , is utilized to compare with present simulations using the “with hard-core repulsion”potential that does not include any non-electrostatic attraction (Fig. 4c). (ii) The RPA+FHtheory prescribed by Equation 10 in ref with a Flory parameter χ = (2 √ π/ /T ∗ isadopted to compare with simulations using the “with LJ” potential (Fig. 4a). Here the χ pa-rameter is purely enthalpic. It is introduced to mimic the background enthalpic LJ interac-tion in the simulations, viz., χ = (pairwise LJ energy) × (pairwise contact volume) / (2 k B T ).We approximate pairwise LJ energy by the well depth (cid:15) , and the pairwise contact volumeby that of a sphere with radius 2 / a which is the residue-residue separation at which theLJ energy is (cid:15) . These approximations lead to χ = 2 (cid:15) √ π/ (3 k B T ) = (2 √ π/ /T ∗ because T ∗ = k B T /(cid:15) and the volume of the conceptual lattice unit for the Flory-Huggins consider-ation is a . (iii) The same RPA+FH theory but with χ = (2 √ π/ /T ∗ , i.e., 1/3 of thebackground interaction strength, is applied accordingly to compare with simulations usingthe “with 1/3 LJ” potential (Fig. 4b). (iv) RPA theory for a screened Coulomb potential,as speciﬁed by Equations 2 and 3 of ref , is used to compare against lattice simulationresults for sv1 and sv15 we computed previously using screened electrostatics. Predictions by these theoretical formulations are summarized in Table 2 together withtheir corresponding simulation results. The theoretical and simulated critical temperaturesand critical volume fractions are plotted in Fig. 14. In this theory-simulation comparison,we stipulate that the polyampholyte volume fraction φ in the simulations may be identiﬁed,9roughly, to the simulated residue density ρ in Table 2 (hence φ cr ≈ ρ cr ). By deﬁnition, ρ isthe average number of residues in a volume of a , thus φ ∝ ρ , and the volume of a residue (abead in the polyampholyte chain model) is ≈ . a (volume of a sphere of radius 2 / a/ a on average (i.e., when ρ = 1), approximately0 .

74 of the system volume is occupied by van der Waals spheres. With this in mind, sincethe maximum achievable packing fraction of equal-sized spheres is π/ √ ≈ .

74 also, themaximum packing possible in our simulation system is characterized by ρ ≈

1. Thus, ρ isalready given in a unit such that it corresponds approximately to the volume fraction φ (0 ≤ φ ≤

1) in Flory-Huggins theory.Figure 14 shows that the simulated T ∗ cr and φ cr are reasonably correlated with their the-oretical counterparts for the sv1, sv15, and sv30 sequences under various simulation condi-tions. The scatter plots in Fig. 14 suggest two rough scaling relations between simulated(“sim”) and theoretical (“thr”) quantities: T ∗ cr , sim ∼ ( T ∗ cr , thr ) . and φ cr , sim ∼ ( φ cr , thr ) . .The data points for diﬀerent models in Fig. 14 indicate clearly that these relations holdquite well for the simulated T ∗ cr ’s and simulated φ cr ’s of sv1, sv15, and sv30 for the “withLJ” and “with 1/3 LJ” potentials but they ﬁt poorly with the simulation data of as1, as2,as3, and as4 (for the “with 1/3 LJ” potential) and those of sv30 for the “with hard-corerepulsion” potential. That the approximate exponents 0 .

39 and 0 .

19 in the above scaling re-lations are both signiﬁcantly smaller than unity implies that explicit-chain simulated LLPSproperties with background LJ, at least as far as T ∗ cr and φ cr are concerned, are less sensitiveto sequence charge pattern than that predicted by RPA+FH theories. However, our ﬁndingthat sv30 (an outlier in Fig. 14a) can—but sv1 and sv15 cannot—phase separate in the“with hard-core repulsion” model (Fig. 7) suggests that in this case LLPS in explicit-chainmodels can be even more sensitive to sequence charge pattern than that in RPA. In otherwords, our results suggest that for polyampholytes that interact only via electrostatics andhard-core excluded-volume repulsion, pure RPA can overstimate LLPS propensity. The casein point here is that while RPA predicts LLPS for sv1 and sv15 with T ∗ cr ’s that are 0.0104and 0.149, respectively, of the T ∗ cr of the sv30 sequence, the sv1 and sv15 sequences donot phase separate at temperature much lower—as low as 0 . / .

65 = 6 . × − that ofsv30’s T ∗ cr —when simulated using the explicit-chain model in Fig. 7.Figure 14b and Table 2 show that the simulated φ cr ’s are larger than their theoreticalcounterparts for the systems we studied. However, the range of variation is much smallerfor the simulated φ cr ’s (from 0 .

082 to 0 . φ cr ’s (from 0 . . φ cr , sim /φ cr , thr of simu-lated to theoretical critical volume fraction, from 0 . / .

124 = 1 .

07 for sv1 in the “with1/3 LJ” model to 0 . / . .

67 for sv30 in the “with hard-core repulsion” model.For sequences sv1, sv15, and sv30 simulated using the same interaction scheme, this ra-tio increases from sv1 to sv15 to sv30, as manifested already by the approximate φ cr , sim ∼ φ cr , thr ) . scaling noted above. The large φ cr , sim /φ cr , thr ratio for sv30 in the “with hard-corerepulsion” model is quantitatively in line with our previous comparison of lattice-simulatedand RPA-predicted φ cr ’s (Figure 12c of ref ). In contrast, the smaller φ cr , sim /φ cr , thr ratioslikely arise from the FH contribution to some of the present theoretical φ cr ’s. Pure FHpredicts φ cr = ( √ N + 1) − [critical χ cr = ( √ N + 1) / (2 N )]. For the current systems with N = 50, this formula translates to φ cr = 0 . T ∗ cr in Fig. 14a becomes r = − . − .

204 that isopposite in sign to that for all the plotted data points (slope = +0 . φ cr in Fig. 14b( r = 0 . In summary, we have taken a step to improve the currently limited understanding ofthe sequence-dependent physical interactions that underlie LLPS of IDPs by extensive sim-ulations of explicit-chain models that allow for a coarse-grained representation of IDP atthe residue level, using multiple-chain systems each consisting of 500 individual chains. Byanalyzing results for 50-residue sequences with diverse charge patterns using model inter-action potentials consisting of diﬀerent combinations of sequence-dependent electrostatics,hard-core excluded-volume repulsion, and LJ attractions, we ﬁnd that while a general inter-residue LJ attraction—which has a short spatial range—favors LLPS, such a backgroundshort-range attraction diminishes sequence speciﬁcity of LLPS. Interestingly, and consistentwith RPA theory, the condensed side of the coexistence curve of one of the polyampholyteswe simulated exhibits a pronounced upward concavity in the absence of background LJattraction. Such upward concavity is not observed in the presence of strong backgroundLJ interaction or in classical FH theory. This ﬁnding suggests that long-range electrostaticrepulsion likely allows for condensed phases that are more dilute than when short-range at-traction is prominent. This observation should contribute insights into the physical forcesthat maintain condensed-phase volume fractions of ≈ . It should berelevant as well for future development of computational and theoretical studies of IDPLLPS that address other sequence-dependent energies beyond electrostatics and LJ-like1hydrophobic interactions.

A main goal of the present study is to use explicit-chain simulations to assess the accu-racy of analytical theories and the utility of simple sequence charge pattern parameters κ and SCD in capturing LLPS properties of IDPs. The calculation of a pattern parameter fora sequence is virtually instantaneous and numerical calculations for analytical theories arefar less computationally intensive than explicit-chain simulations. Therefore, in additionto being tools for elucidating LLPS physics, sequence charge pattern parameters andanalytical theories can contribute to eﬃcient high-throughput bioinformatics studies andthe screening of candidates in IDP sequence design. Here we have compared and contrastedresults simulated for several polyampholyte sequences using the present explicit-chainmodel against the corresponding analytical theory predictions. A broader context for thisevaluation is provided by RPA-predicted critical temperatures and volume fractions wecalculated for 10,000 randomly sampled sequences. For three sequences belonging to aprevious studied set, the simulated critical temperatures, T ∗ cr ’s, correlate reasonably wellwith theoretical predictions and also with the κ and SCD parameters. We ﬁnd that thesimulated T ∗ cr ’s are less sensitive to sequence charge pattern than their theory-predictedcounterparts when a substantial background LJ interaction is in play. However, simulated T ∗ cr ’s can be more sensitive than RPA-predicted T ∗ cr ’s in the absence of background LJinteraction. In this regard, our results suggest that LLPS propensity can be overestimatedby RPA in such cases for sequences with small κ and small − SCD values. Most notably,for four sequences intentionally generated as outliers in the κ -SCD relationship, neither κ nor SCD is a LLPS predictor with a reliable discriminatory power. This discoverysuggests that the eﬀect of blockiness of sequence-local charge pattern on LLPS may beoverestimated by κ , whereas the nonlocal eﬀect of sequence charge pattern on LLPS maybe overestimated by SCD. Therefore, a more generally applicable sequence charge patternparameter for LLPS propensity should be developed to overcome this limitation. All inall, in view of the new questions posed by our ﬁndings, there is no shortage of productiveavenues of further investigation into the physical basis of biomolecular condensates. Conﬂicts of Interest

There are no conﬂicts of interest to declare.

Acknowledgments

We thank Julie Forman-Kay for helpful discussions. This work was supported by Cana-dian Institutes of Health Research grants MOP-84281 and NJT-155930, Natural Sciencesand Engineering Research Council of Canada Discovery Grant RGPIN-2018-04351, andcomputational resources provided by SciNet of Compute/Calcul Canada.2

References C. P. Brangwynne, C. R. Eckmann, D. S. Courson, A. Rybarska, C. Hoege, J. Gharakhani,F. J¨ulicher and A. A. Hyman,

Science , 2009, , 1729–1732. P. Li, S. Banjade, H.-C. Cheng, S. Kim, B. Chen, L. Guo, M. Llaguno, J. V. Hollingsworth,D. S. King, S. F. Banani, P. S. Russo, Q.-X. Jiang, B. T. Nixon and M. K. Rosen,

Nature ,2012, , 336–340. M. Kato, T. W. Han, S. Xie, K. Shi, X. Du, L. C. Wu, H. Mirzaei, E. J. Goldsmith, J.Longgood, J. Pei, N. V. Grishin, D. E. Frantz, J. W. Schneider, S. Chen, L. Li, M. R. Sawaya,D. Eisenberg, R. Tycko, and S. L. McKnight,

Cell , 2012, , 753–767. T. J. Nott, E. Petsalaki, P. Farber, D. Jervis, E. Fussner, A. Plochowietz, T. D. Craggs, D. P.Bazett-Jones, T. Pawson, J. D. Forman-Kay and A. J. Baldwin,

Mol. Cell , 2015, , 936–947. A. Molliex, J. Temirov, J. Lee, M. Coughlin, A. P. Kanagaraj, H. J. Kim, T. Mittag and J.P. Taylor,

Cell , 2015, , 123–133. Y. Lin, D. S. W. Protter, M. K. Rosen and R. Parker,

Mol. Cell , 2015, , 208–219. E. B. Wilson,

Science , 1899, , 33–45. L. Ehrenberg,

Hereditas , 1946, , 407–418. H. Walter and D. E. Brooks,

FEBS Lett. , , 135–139. S. F. Banani, H. O. Lee, A. A. Hyman and M. K. Rosen,

Nat. Rev. Mol. Cell Biol. , 2017, ,285–298. Y. Shin and C. P. Brangwynne,

Science , 2017, , eaaf4382. U. S. Eggert,

Biochemistry , 2018, , 2403–2404. C. L. Cuevas-Velazquez and J. R. Dinneny,

Curr. Opin. Plant Biol. , 2018, , 68–74. M. Feric, N. Vaidya, T. S. Harmon, D. M. Mitrea, L. Zhu, T. M. Richardson, R. W. Kriwacki,R. V. Pappu and C. P. Brangwynne,

Cell , 2016, , 1686–1697. A. Vovk, C. Gu, M. G. Opferman, L. E. Kapinos, R. Y. H. Lim, R. D. Coalson, D. Jasnowand A. Zilman, eLife , 2016, , e10785. A. Zilman,

J. Mol. Biol. , 2018, DOI:10.1016/j.jmb.2018.07.011. M. Zeng, Y. Shang, Y. Araki, T. Guo, R. L. Huganir and M. Zhang,

Cell , 2016, , 1163–1175. Z. Feng, M. Zeng, X. Chen and M. Zhang,

Biochemistry , 2018, , 2530–2539. J. A. Riback, C. D. Katanski, J. L. Kear-Scott, E. V. Pilipenko, A. E. Rojek, T. R. Sosnickand D. A. Drummond,

Cell , 2017, , 1028–1040. T. C. Boothby, H. Tapia, A. H. Brozena, S. Piszkiewicz, A. E. Smith, I. Giovannini, L. Rebecchi, G. J. Pielak, D. Koshland and B. Goldstein,

Mol. Cell , 2017, , 975–984. H. Cai, B. Gabryelczyk, M. S. S. Manimekalai, G. Gr¨uber, S. Salentinig and A. Miserez,

SoftMatter , 2017, , 7740–7752. S. Kim, H. Y. Yoo, J. Huang, Y. Lee, S. Park, Y. Park, S. Jin, Y. M. Jung, H. Zeng, D. S.Hwang and Y. Jho,

ACS Nano , 2017, , 6764–6772. C. D. Keating,

Acc. Chem. Res. , 2012, , 2114–2124. R. R. Poudyal, F. P. Cakmak, C. D. Keating and P. C. Bevilacqua,

Biochemistry , 2018, ,2509–2519. A. I. Oparin,

The Origin of Life , MacMillan Co., New York, 1938. F. J. Dyson,

J. Mol. Evol. , 1982, , 344–350. F. Dyson,

Orgins of Life , Cambridge University Press, New York, 1985. D. Srivastava and M. Muthukumar,

Macromolecules , 1996, , 2324–2326. R. K. Das and R. V. Pappu,

Proc. Natl. Acad. Sci. U.S.A. , 2013, , 13392–13397. R. K. Das, K. M. Ruﬀ and R. V. Pappu,

Curr. Opin. Struct. Biol. , 2015, , 102–112. P. R. Banerjee, A. N. Milin, M. M. Moosa, P. L. Onuchic and A. A. Deniz,

Angew. Chem.Int. Ed. Engl. , 2017, , 11354–11359. X.-H. Li, P. L. Chavali, R. Pancsa, S. Chavali and M. M. Babu,

Biochemistry , 2018, ,2452–2461. A. S. Holehouse and R. V. Pappu,

Biochemistry , 2018, , 2415–2423. S. Jain, J. R. Wheeler, R. W. Walters, A. Agrawal, A. Barsic, and R. Parker,

Cell , 2016, ,487–498. I. Kwon, M. Kato, S. Xiang, L. Wu, P. Theodoropoulos, H. Mirzaei, T. Han, S. Xie, J. L.Corden and S. L. McKnight,

Cell , 2013, , 1049–1060. Z. Monahan, V. H. Ryan, A. M. Janke, K. A. Burke, S. N. Rhoads, G. H. Zerze, R. O’Meally,G. L. Dignon, A. E. Conicella, W. Zheng, R. B. Best, R. N. Cole, J. Mittal, F. Shewmakerand N. L. Fawzi,

EMBO J. , 2017, , 2951–2967. C. P. Brangwynne, T. J. Mitchison and A. A. Hyman,

Proc. Natl. Acad. Sci. U.S.A. , 2011, , 4334–4339. D. Zwicker, R. Seyboldt, C. A. Weber, A. A., Hyman and F. J¨ulicher,

Nat. Phys. , 2017, ,408–413. J. Berry, C. P. Brangwynne and M. Haataja,

Rep. Prog. Phys. , 2018, , 046601. J. D. Wurtz and C. F. Lee,

New J. Phys. , 2018, , 045008. T. S. Harmon, A. S. Holehouse, M. K. Rosen and R. V. Pappu, eLife , 2017, , e30294. Y.-H. Lin, J. D. Forman-Kay and H. S. Chan,

Biochemistry , 2018, , 2499–2508. S. C. Weber,

Curr. Opin. Cell Biol. , 2017, , 62–71. F. G. Quiroz and A. Chilkoti,

Nat. Mater. , 2015, , 1164–1171. L.-W. Chang, T. K. Lytle, M. Radhakrishna, J. J. Madinya, J. V´elez, C. E. Sing and S. L. Perry,

Nat. Comm. , 2017, , 1273. J. R. Simon, N. J. Carrol, M. Rubinstein, A. Chilkoti and G. P. L´opez,

Nat. Chem. , 2017, ,509–515. K. M. Ruﬀ, S. Roberts, A. Chilkoti and R. V. Pappu,

J. Mol. Biol. , 2018,DOI:10.1016/j.jmb.2018.06.031. A. A. Hyman, C. A. Weber and F. J¨ulicher,

Annu. Rev. Cell Dev. Biol. , 2014, , 39–58. C. P. Brangwynne, P. Tompa and R. V. Pappu,

Nat. Phys. , 2015, , 899–904. J. Wang, J. M. Choi, A. S. Holehouse, X. Zhang, M. Jahnel, R. Lemaitre, S. Maharana, A.Pozniakovsky, D. Drechsel, I. Poser, R. V. Pappu, S. Alberti and A. A. Hyman,

Cell , 2018,DOI:10.1016/j.cell.2018.06.006. A. V. Ermoshkin and M. Olvera de la Cruz,

Macromolecules , 2003, , 7824–7832. Y.-H. Lin, J. D. Forman-Kay and H. S. Chan,

Phys. Rev. Lett. , 2016, , 178101. Y.-H. Lin, J. Song, J. D. Forman-Kay and H. S. Chan,

J. Mol. Liq. , 2017, , 176–193. Y.-H. Lin and H. S. Chan,

Biophys, J. , 2017, , 2043–2046. Y.-H. Lin, J. P. Brady, J. D. Forman-Kay and H. S. Chan,

New J. Phys. , 2017, , 115003. G. L. Dignon, W. Zheng, R. B. Best, Y. C. Kim and J. Mittal,

Proc. Natl. Acad. Sci. U.S.A. ,2018 , 9929-9934. T. K. Lytle and C. E. Sing,

Soft Matter , 2017, , 7001–7012. G. Desjardins, C. A. Meeker, N. Bhachech, S. L. Currie, M. Okon, B. J. Graves and L. P.McIntosh,

Proc. Natl. Acad. Sci. U.S.A. , 2014, , 11019–11024. J. Song, S. C. Ng, P. Tompa, K. A. W. Lee and H. S. Chan,

PLoS Comput. Biol. , 2013, ,e1003239. L. M. Salonen, M. Ellermann amd F. Diederich,

Angew. Chem. Int. Ed. , 2011, , 4808–4842. R. M. Vernon, P. A. Chong, B. Tsang, T. H. Kim, A. Blah, P. Farber, H. Lin and J. D.Forman-Kay, eLife , 2018, , e31486. S. Das, A. Eisen, Y.-H. Lin and H. S. Chan,

J. Phys. Chem. B , 2018, , 5418–5431. S. Das, A. Eisen, Y.-H. Lin and H. S. Chan,

J. Phys. Chem. B , 2018, , 8111. K. M. Ruﬀ, T. S. Harmon and R. V. Pappu,

J. Chem. Phys. , 2015, , 243123. Y. S. Harmon, A. S. Holehouse and R. V. Pappu,

New J. Phys. , 2018, , 045002. G. L. Dignon, W. Zheng, Y. C. Kim, R. B. Best and J. Mittal,

PLoS Comput. Biol. , 2018, ,e1005941. K. A. Burke, A. M. Janke, C. L. Rhine and N. L. Fawzi,

Mol. Cell , 2015, , 231–241. J. P. Brady, P. J. Farber, A. Sekhar, Y.-H. Lin, R. Huang, A. Bah, T. J. Nott, H. S. Chan,A. J. Baldwin, J. D. Forman-Kay and L. E. Kay,

Proc. Natl. Acad. Sci. USA , 2017, ,E8194–E8203. N. Dorsaz, G. M. Thurston, A. Stradner, P. Schurtenberger and G. Foﬃ,

J. Phys. Chem. B ,2009, , 1693–1709. M. Kastelic, Y. V. Kalyuzhnyi, B. Hribar-Lee, K. A. Dill and V. Vlachy,

Proc. Natl. Acad.Sci. USA , 2015, , 6766–6770. M. Kastelic, Y. V. Kalyuzhnyi and V. Vlachy,

Soft Matter , 2016, , 7289–7298. A. Stradner, G. Foﬃ, N. Dorsaz, G. M. Thurston and P. Schurtenberger,

Phys. Rev. Lett. ,2007, , 198103. N. Dorsaz, G. M. Thurston, A. Stradner, P. Schurtenberger and G. Foﬃ,

Soft Matter , 2011, , 1763–1776. V. Nguemaha and H.-X. Zhou,

Sci. Rep. , 2018, , 6728. S. Qin and H.-X. Zhou,

J. Phys. Chem. B , 2016, , 8164–8174. S. Qin and H.-X. Zhou,

Curr. Opin. Struct. Biol. , 2017, , 28–37. C. Liu, N. Asherie, A. Lomakin, J. Pande, O. Ogun and G. B. Benedek,

Proc. Natl. Acad.Sci. U.S.A. , 1996, , 377–382. M. Muschol and F. Rosenberger,

J. Chem. Phys. , 1997, , 1953–1962. J. M¨oller, S. Grobelny, J. Schulze, S. Bieder, A. Steﬀen, M. Erlkamp, M. Paulus, M. Tolanand R. Winter,

Phys. Rev. Lett. , 2014, , 028101. J. Chen,

Arch. Biochem. Biophys. , 2012, , 123–131. T. Chen, J. Song and H. S. Chan,

Curr. Opin. Struct. Biol. , 2015, , 32–42. R. B. Best,

Curr. Opin. Struct. Biol. , 2017, , 147–154. Z. A. Levine and J.-E. Shea,

Curr. Opin. Struct. Biol. , 2017, , 95–103. F. J. Blas, L. G. MacDowell, E. de Miguel and G. Jackson,

J. Chem. Phys. , 2008, , 144703. K. S. Silmore, M. P. Howard and A. Z. Panagiotopoulos,

Mol. Phys. , 2017, , 320–327. G. C. Yeo, F. W. Keeley and A. S. Weiss,

Adv. Colloid Interface Sci. , 2011, , 94–103. L. D. Muiznieks and F. W. Keeley,

Biochim. Biophys. Acta , 2013, , 866–875. E. W. Martin and T. Mittag,

Biochemistry , 2018, , 2478–2487. S. Ambadipudi, J. Biernat, D. Riedel E. Mandelkow and M. Zweckstetter,

Nat. Commun. ,2017, , 275. H. Cinar, S. Cinar, H. S. Chan and R. Winter,

Chem. Eur. J. , 2018, , 8286–8291. M. S. Moghaddam, S. Shimizu and H. S. Chan,

J. Am. Chem. Soc. , 2005, , 303–316. C. L. Dias and H. S. Chan,

J. Phys. Chem. B , 2014, , 7488–7509. H. Krobath, T. Chen and H. S. Chan,

Biochemistry , 2016, , 6269–6281. A. S. Holehouse and R. V. Pappu,

Annu. Rev. Biophys. , 2018, , 19–39. L. Sawle and K. Ghosh,

J. Chem. Phys. , 2015, , 085101. L. Sawle, J. Huihui and K. Ghosh,

J. Chem. Theor. Comput. , 2017, , 5065–5075. T. Firman and K. Ghosh,

J. Chem. Phys. , 2018, , 123305. T. Zarin, C. N. Tsai, A. N. Nguyen Ba and A. M. Moses,

Proc. Natl. Acad. Sci. U.S.A. , 2017, , E1450–E1459. K. P. Sherry, R. K. Das, R. V. Pappu and D. Barrick,

Proc. Natl. Acad. Sci. U.S.A. , 2017, , E9243–E9252. M. Dzuricky, S. Roberts and A. Chilkoti,

Biochemistry , 2018, , 2405–2414. C. Domb,

Adv. Chem. Phys. , 1969, , 229–259. P. G. de Gennes,

Scaling Concepts in Polymer Physics ; Cornell University Press, Ithaca, 1979.

K. F. Freed,

Renormalization Group Theory of Macromolecules ; Wiley, New York, 1987.

H. Kaya and H. S. Chan,

Phys. Rev. Lett. , 2003, , 258104. Z. Liu, J. K. Mann, E. L. Zechiedrich and H. S. Chan,

J. Mol. Biol. , 2006, , 268–285.

T. Chen and H. S. Chan,

Phys. Chem. Chem. Phys. , 2014, , 6460–6479. Z. Liu and H. S. Chan,

J. Phys. Condens. Matt. . 2015, , 354103. A. Z. Panagiotopoulos, V. Wong and M. A. Floriano,

Macromolecules , 1998, , 912–918. G. Orkoulas, S. K. Kumar and A. Z. Panagiotopoulos,

Phys. Rev. Lett. , 2003, , 048303. D. W. Cheong and A. Z. Panagiotopoulos,

Mol. Phys. , 2005, , 3031–3044.

S. Rauscher and R. Pom`es, eLife , 2017, , e26526. S. Piana, J. L. Klepeis and D. E. Shaw,

Curr. Opin. Struct. Biol. , 2014, , 98–105. S. Rauscher, V. Gapsys, M. J. Gajda, M. Zweckstetter, B. L. de Groot and H. Grubm¨uller,

J.Chem. Theor. Comput. , 2015, , 5513–5524. R. B. Best, W. Zheng and J. Mittal,

J. Chem. Theory Comput. , 2014, , 5113–5124. J. Huang, S. Rauscher, G. Nawrocki, T. Ran, M. Feig, B. L. de Groot, H. Grubm¨uller and A.D. MacKerell, Jr.,

Nat. Methods , 2017, , 71–73. P. Robustelli, S. Piana and D. E. Shaw,

Proc. Natl. Acad. Sci. U.S.A. , 2018, , E4758–E4766.

G. L. Butterfoss, B. Yoo, J. N. Jaworski, I. Chorny, K. A. Dill, R. N. Zuckermann, R. Bonneau,K. Kirshenbaum, V. A. Voelz,

Proc. Natl. Acad. Sci. U.S.A. , 2012, , 14320–14325.

J. Sun and R. N. Zuckermann,

ACS Nano , 2013, , 4715–4732. J. A. Anderson, C. D. Lorenz and A. Travesset,

J. Comput. Phys. , 2008, , 5342–5359.

J. Glaser, T. D. Nguyen, J. A. Anderson, P. Liu, F. Spiga, J. A. Millan, D. C. Morse and S.C. Glotzer,

Comput. Phys. Commun. , 2015, , 97–107.

D. N. LeBard, B. G. Levine, P. Mertmann, S. A. Barr, A. Jusuﬁ, S. Sanders, M. L. Klein andA. Z. Panagiotopoulos,

Soft Matter , 2012, , 2385–2397. A. Trokhymchuk and J. Alejandre,

J. Chem. Phys. , 1999, , 8510–8523.

D. Duque and L. F. Vega,

J. Chem. Phys. , 2004, , 8611.

C. J. Mundy, J. I. Siepmann and M. L. Klein,

J. Chem. Phys. , 1995, , 3376–3380.

M. G. Martin and J. I. Siepmann,

J. Chem. Phys. B , 1998, , 2569–2577.

J. P. Nicolas and B. Smit,

Mol. Phys. , 2009, , 2471–2475.

J. C. Pamies, C. McCabe, P. T. Cummings and L. F. Vega,

Mol. Simul. , 2010, , 463–470. M. P. Allen and D. J. Tildesley,

Computer Simulation of Liquids , Oxford University Press,New York, 1991. W. Humphrey, A. Dalke and K. Schulten,

J. Molec. Graphics , 1996, , 33–38. F. Wang and D. P. Landau,

Phys. Rev. E , 2001, , 056101. D. P. Landau, S.-H. Tsai and M. Exler,

Am. J. Phys. , 2004, , 1294–1302. H. S. Chan and K. A. Dill,

J. Chem. Phys. , 1990, , 3118–3135. N. G. Hunt, L. M. Gregoret and F. E. Cohen,

J. Mol. Biol. , 1994, , 312–326.

D. P. Yee, H. S. Chan, T. F. Havel and K. A. Dill,

J. Mol. Biol. , 1994, , 557–573.

A. Maritan, C. Micheletti, A. Trovato and J. R. Banavar,

Nature , 2000, , 287–290.

S. Wallin and H. S. Chan,

J. Phys. Condens. Matt. , 2006, , S307–S328. M.-T. Wei, S. Elbaum-Garﬁnkle, A. S. Holehouse, C. C.-H. Chen, M. Feric, C. B. Arnod, R.D. Priestley, R. V. Pappu and C. P. Brangwynne,

Nat. Chem. , 2017, , 1118–1125. Table 1.

Charge pattern parameters for the sequences studied in this work.sequence SCD κ sv1 − .

413 0 . − .

349 0 . − .

84 1 . − .

79 0 . − .

30 0 . − .

266 0 . − .

11 0 . Table 2.

Simulated and theoretical critical temperatures and critical densities or volumefractions considered in Fig. 12 and Fig. 13. Data for sequences sv1 and sv15 in the latticemodel and their theoretical counterparts (last two rows) are obtained from Das et al. Other data are from the present study.Potential type Sequence Simulation Theory T ∗ cr ρ cr T ∗ cr φ cr “with LJ” sv1 3.52 0.152 4.55 0.124sv15 3.86 0.130 4.93 0.098sv30 4.97 0.120 10.52 0.019“with 1/3 LJ” sv1 1.20 0.133 1.52 0.124sv15 1.52 0.127 2.14 0.040sv30 3.44 0.086 9.14 0.014as1 2.25 0.095 4.43 0.017as2 2.31 0.096 3.77 0.020as3 2.28 0.110 3.63 0.025as4 2.41 0.095 3.27 0.032“with hard-core sv30 1.65 0.082 8.57 0.0123repulsion”screened sv1 0.70 0.106 a a a This quantity equals the simulated critical volumefraction φ cr in the lattice model. z FIG. 1: Simulation methodology. The schematic ( left ) illustrates the computational technique we adopt for calculating phase diagrams of polyampholytes. After energy minimization of acollection of model polyampholytes (chains of red and blue beads) at T ∗ = 4 .

0, the cubic simulationbox (green frame) with periodic boundary conditions (visualized using VMD with chains atboundaries unwrapped) is compressed under the same high temperature (blue horizontal arrow).This is followed by an expansion (blue vertical arrow) of the simulation box at T ∗ = 1 . z ) of one of its edges, resulting in an enlargedsimulation box taking the shape of an elongated rectangular cuboid with a slab of polyampholytescentered at z = 0. The system is then equilibrated at diﬀerent temperatures. The plot ( right )shows an example of temperature-sensitive equilibrated distributions of polyampholyte density ρ as a function of z (in units of a ) for the strictly alternating sequence sv1 (refs , see below).The downward pointing arrow indicates increasing T ∗ . For a given temperature, the maximumof the distribution is identiﬁed as the polyampholyte density of the condensed phase whereasthe minimum as the polyampholyte density of the dilute phase. A phase diagram can thus beconstructed from these data. See text for further details. E K E K E K E K E K E K E K E K E K E K E K E K E K E K E K E K E K E K E K E K E K E K E K E K E KE E E E E E E E E E E E E E E E E E E E E E E E E K K K K K K K K K K K K K K K K K K K K K K K K K

EK K K K E K KK E K K E KK EEE K E K E KK E KKKK E K E KK E E E E E E EE K EEEEE KK

E KK K K K K K K K K K E K K K K E K K K E K K E K EEE K E E E E K E E E E K E E EE K E E E E E

E E E E E E E E K K E K K K K E E E E E E E E E E E K K K K E E E K E K E K K K K K K K K KK KKK

EK K K K E E EEE E E E E E E E E E E E E K K E K K K K K K KKKKKKKK E K KEEEEE K K K

E K KK KK K K K K KEEEEEEEEEEEEEE

K K K K K K KKKKKKKK EEEEE EEEEE K sv1sv15sv30as1as2as3as4

FIG. 2: The polyampholyte sequences studied in this work. Every sequence contains 25 K’s (bluebeads) and 25 E’s (red beads) with diﬀerent arrangements of K’s and E’s along the sequences.Sequences sv1, sv15, and sv30 are from ref ; sequences as1, as2, as3, and as4 are introduced bythe present work. -SCD -SCD d i s t r i b u t i o n l n P l n P (a) (b)(c) FIG. 3: Sequence-space statistics of charge pattern parameters. Normalized distributions of (a)SCD [Eq. (5)] and (b) κ [Eq. (4)] among 50-residue fully charged but overall neutral KE sequencesare computed from 10,000 sequences generated by repeat exchanges of sequence positions of ran-domly selected pairs of positive (K) and negative (E) residues. These distributions—histograms in(a,b)—cover only the readily sampled range of SCD and κ values, whereas the full distributions ofsequence population P over the entire range of all possible SCD and κ values are estimated usingthe Wang-Landau technique [semi-log plots in the insets of (a) and (b)]. (c) The − SCD and κ values of the sv1, sv15, sv30 sequences (red squares, bottom to top ) and the as1, as2, as3, andas4 sequences (orange circles, bottom to top ) are shown against the backdrop of the − SCD versus κ scatter plot for the 10,000 randomly sampled sequences (blue circles). Sequence sv30 is shownin the inset of (c) because it lies outside the range of the scatter plot. In the histograms in (a)and (b), each of the horizontal ranges between − SCD = 0 . . κ = 0 . . − SCD or κ . The insets of (a) and (b) are obtained by averaging over ten Wang-Landau processesinitialized by diﬀerent sequences; sampled sequences are binned into 20 equal intervals for thefull range of SCD and κ values using the same rule for inclusion/exclusion of bin boundaries asdescribed above for the other bins over more limited SCD and κ ranges. The scale for population P is such that the sum of all 20 binned populations is equal to the total number, 50! / (25!25!),of fully charged 50-residues KE sequences with zero net charge. For the same reason, P is setto unity for the maximum value of − SCD and for κ = 1 since both of these parameter valuesuniquely specify the diblock sv30 sequence. Assuringly, the trend in the inset of (b) is very similarto that exhibited by the previously estimated population distribution over κ in Fig S1 of ref . FIG. 4: Inter-residue interaction potentials used in our explicit-chain models. In each panel, totalinteraction energy U ( r ) in units of (cid:15) is shown as a function of inter-residue distance r . The upperand lower curves are for a pair of interacting residues with like and opposite charges, respectively,as illustrated by the red- and blue-bead representations of charged residues. The U ( r ) = 0 levelis marked by a horizontal dotted line. (a) Electrostatics + LJ model [Eq. (1) plus Eq. (2)]. (b)Electrostatics + 1/3 LJ model [Eq. (1) plus 1/3 of Eq. (2)]. (c) Electrostatics + hard-core repulsionmodel [Eq. (1) plus a modiﬁed form of Eq. (2) for which the entire LJ term is set to zero for r > a ]. (a) (b) FIG. 5: Charge-pattern-dependent phase separations. Phase diagrams are calculated for se-quences sv1 (triangles), sv15 (diamonds), and sv30 (circles) using (a) the electrostatics + LJmodel potential (Fig. 4a) and (b) the electrostatics + 1/3 LJ model potential (Fig. 4b). Fittedcoexistence curves simulated using the present coarse-grained continuum explicit-chain model hereand in subsequent ﬁgures are constructed as described and serve largely as guides to the eye.In (b), the vertical dashed line marks the critical density of sv30 to underscore that it is lowerthan the critical densities of sv1 and sv15. The horizontal dashed line marks the T ∗ = 2 . left ) and forsequence sv30 ( right ) in (b) are shown below the phase diagrams. Renditions of close-up images(dark-green boxes, bottom ) of selected parts of the simulation boxes (blue boxes with arrows) areprovided to illustrate key diﬀerences in local chain conﬁguration between the two systems. Eachof the images in the bottom dark-green boxes consists of one randomly chosen chain and everychain that has either 5 or more ( left for sv1) or 15 or more ( right for sv30) residues positionedwithin a distance of 6 a from a residue of the chosen chain. A pair of such chains are depicted inmore saturated color for the sole purpose of enhancing the visual eﬀect. FIG. 6: Phase diagrams for the four newly introduced polyampholyte sequences. Simulationresults are shown for sequences as1 (triangles), as2 (diamonds), as3 (circles), and as4 (squares),all computed using the electrostatics + 1/3 LJ potential (Fig. 4b). Critical temperature andcritical volume are quite insensitive to the variation of charge pattern among these sequences. (d)(a)(b)(c) FIG. 7: Phase behaviors in the “with hard-core repulsion” model. Polyampholyte density as afunction of z is calculated using the model potential in Fig. 4c for (a) sv1, (b) sv15, and (c) sv30at the temperatures indicated. (d) Phase diagram for sequence sv30 in the same model. E K E K E K E K E K E K E K E K E K E K E K E K E K E K E K E K E K E K E K E K E K E K E K E K E K sv1sv30

E E E E E E E E E E E E E E E E E E E E E E E E E K K K K K K K K K K K K K K K K K K K K K K K K KE E E E E E E E E E E E E E E E E E E E E E E E EK K K K K K K K K K K K K K K K K K K K K K K K K EK E K E K E K E K E K E K E K E K E K E K E K E K E K E K E K E K E K E K E K E K E K E K E K E K (a)(b)

FIG. 8: Inter-chain polyampholyte interactions are strongly sequence dependent. A pair of sv1sequences (a) and a pair of sv30 sequences are each shown in an energetically favorable alignedconﬁguration. (c) Total interaction energy for the conﬁgurations in (a) and (b) as functions ofthe maximum residue-residue distance, r max , that is taken into consideration in computing theelectrostatic energies. FIG. 9: Strictly alternating polyampholytes with hard-core repulsion at extremely low tempera-ture. A snapshot of sv1 chains in a simulation box at T ∗ = 0 .

01. The chains are seen as associatedand not scattered though there is no clear sign of phase separation (cf. Fig. 7a). The close-upimage (bottom) includes all of the chains that have at least one residue within 6 a of a residue ona randomly selected chain. (The actual number of such ≤ a inter-chain residue-residue distancesvaries from 1 to 24 in the chain cluster shown). Two chains inside the bottom box are depicted inmore saturated color for the sole purpose of enhancing visualization. (a)(b)(c) FIG. 10: Comparing distributions of cavity size in diﬀerent polyampholyte-rich states. Theassociated state of sv1 at T ∗ = 0 .

001 in the “electrostatic + hard-core repulsion” model (a) andthe condensed phase of sv30 at T ∗ = 0 . a ) with highest average polyampholyte density [indicated by dotted light-blue boxes in (a,b); z ∈ [ − a, a ] in (a) and z ∈ [ − . a, . a ] in (b)], with periodic boundary conditions maintainedwithin these volumes of (33 a ) along x and y but not in the z direction. We consider small cubicvolumes of l , where l = a, a, . . . , a , that are placed at intervals of a in all three spatial directionswithin these volumes and determine the number of such small cubic volumes that do not containthe center of any of the monomers that make up the polyampholytes. These small cubic volumesare termed empty. The conﬁgurational diﬀerence between the polyampholyte-rich states of thesv1 and sv30 systems here is characterized by two measures. First, a cavity is identiﬁed as a regioncovering one l = a empty volume (of a ) and all l = a empty volumes contiguous to it directlyor indirectly (i.e., a given l = a empty volume can only belong to one cavity). The size of cavityis given by the number of contiguous l = a empty volumes. The distribution of cavity volumeso deﬁned is given in (c) for sv1 (black open circles) and sv30 (red open squares). Second, thetotal number, P , of positions of empty volumes for various l are counted, and the ratio of P forsv1 to that for sv30 is taken for various l values. The inset in (c) shows P (sv1) /P (sv30) > l . Hence both measures indicate that there are substantiallymore cavities of larger volumes for the sv1 system than for the sv30 system analyzed in this ﬁgure. FIG. 11: Lattice chain conﬁgurations in dilute and condensed phases of phase-separated polyam-pholytes. The phase diagrams (center) for sv1 (triangles ﬁtted by dashed curve) and for sv15(diamonds ﬁtted by solid curve) were computed using our previous lattice model and adaptedusing data from Figure 8 of Das et al. T is absolute temperature and φ is polyampholyte volumefraction as described. Here we provide snapshots of the dilute and of the condensed phases inthe lattice model under the simulated (

T, φ ) conditions indicated by the red arrows. Snapshotson the dilute side consist of chains in a randomly selected volume within the low- φ region of thesimulation box. Snapshots on the condensed side show substantial fractions of the condensedphases as well as close-up images of parts of them (marked by light boxes with arrows). Theclose-up images are rendered using the same protocol as that for the bottom-right image in Fig. 5.Selected chains in the snapshots of the dilute state and in the close-up images are depicted in moresaturated colors than others for the sole purpose of enhancing visualization. T -SCD(a) (b)(d)(c) FIG. 12: Relationship between RPA-predicted phase properties and sequence charge patternparameters. RPA-predicted critical temperatures T ∗ cr (a,b) and critical volume fractions φ cr (c,d)of the 10,000 sampled polyampholyte sequences in Fig. 3 are computed using the salt-free RPAformulation (no Flory χ parameter) and plotted against their − SCD (a,c) and κ (b,d) values(blue circles). The sequences studied by the present explicit-chain simulations are marked as inFig. 3 (red squares for sv1, sv15, and sv30; orange circles for as1, as2, as3, and as4) with sequencesv30 shown in the insets. -SCD (a) (b) T FIG. 13: Dependence of explicit-chain-simulated LLPS propensity on sequence charge patternparameters. The critical temperatures, T ∗ cr ’s, of 50-residue polyampholytes simulated in the presentexplicit-chain continuum model are plotted against − SCD (a) and κ (b) for the “with LJ” potentialin Fig. 4a (open squares for sv1, sv15, and sv30), the “with 1/3 LJ” potential in Fig. 4b (ﬁlledsquares for the three “sv” sequences, ﬁlled circles for as1, as2, as3, and as4), and the “with hard-core repulsion” potential in Fig. 4c (diamonds for sv30). The T ∗ cr ’s of sv1 and sv15 from ourprevious explicit-chain lattice model simulation are also plotted for comparison (red-ﬁlled blacksquares). Dashed and dotted lines joining data points simulated using the same models for sv1,sv15, and sv30 as well as for sv1 and sv15 are merely guides for the eye. s i m u l a t e d l o g T log T l o g s i m u l a t e d theoretical logtheoretical (a) (b) − − − − − − − − − − FIG. 14: Comparing RPA-predicted and explicit-chain-simulated phase properties of polyam-pholytes. Simulated logarithmic (log = log ) critical temperature T ∗ cr (a) and critical volumefraction φ cr (b) are compared against their theoretical counterparts predicted using RPA orRPA+FH theories. In this ﬁgure, φ cr ’s for the results simulated in the present study are iden-tiﬁed with the simulated ρ cr ’s (see text and Table 2 for details). The same symbols as those inFig. 13 are used to specify the sequences and simulation models for the data points. The dashedleast-squares regression lines are ﬁtted using all the plotted data points and represent approxi-mate power-law correlations of T ∗ cr or φ cr between theory and simulation. The lines are given bylog( T ∗ cr , sim ) = 0 .

387 log( T ∗ cr , thr ) + 0 .

145 in (a) and log( φ cr , sim ) = 0 .

191 log( φ cr , thr ) − .

680 in (b)where simulated and theoretical quantities are indicated, respectively, by the subscripts “sim” and“thr”. The Pearson correlation coeﬃcients are 0 .

843 for (a) and 0 .

846 for (b). E K E K E K E K E K E K E K E K E K E K E K E K E K E K E K E K E K E K E K E K E K E K E K E K E K