Comparative Roles of Charge, π and Hydrophobic Interactions in Sequence-Dependent Phase Separation of Intrinsically Disordered Proteins
Suman Das, Yi-Hsuan Lin, Robert M. Vernon, Julie D. Forman-Kay, Hue Sun Chan
11April 21, 2019
Comparative Roles of Charge, π , and HydrophobicInteractions in Sequence-Dependent PhaseSeparation of Intrinsically Disordered Proteins Suman D AS , Yi-Hsuan L IN , , Robert M. V
ERNON , Julie D. F
ORMAN -K AY , and Hue Sun C
HAN , ∗ Department of Biochemistry, University of Toronto, Toronto, Ontario M5S 1A8, Canada; Molecular Medicine, Hospital for Sick Children, Toronto, Ontario M5G 0A4, Canada ∗ Corresponding authorE-mail: [email protected]; Tel: (416)978-2697; Fax: (416)978-8548Mailing address:Department of Biochemistry, University of Toronto, Medical Sciences Building – 5th Fl.,1 King’s College Circle, Toronto, Ontario M5S 1A8, Canada. a r X i v : . [ q - b i o . B M ] M a y Abstract
Biomolecular condensates underlain by liquid-liquid phase separation (LLPS) of proteinsand nucleic acids can serve important biological functions; yet current understanding ofthe effects of amino acid sequences on LLPS is limited. Endeavoring toward a transfer-able, predictive coarse-grained explicit-chain model for biomolecular LLPS, we used theN-terminal intrinsically disordered region (IDR) of the DEAD-box helicase Ddx4 as a testcase to conduct extensive multiple-chain simulations to assess the roles of electrostatic, hy-drophobic, cation- π , and aromatic interactions in sequence-specific phase behaviors. Threedifferent residue-residue interaction schemes sharing the same electrostatic potential wereevaluated. We found that neither a common scheme based on amino acid hydrophobic-ity nor one augmented with arginine/lysine-aromatic cation- π interactions can consistentlyaccount for the available experimental LLPS data on the wildtype, a charge-scrambled mu-tant, a phenylalanine-to-alanine (FtoA) mutant and an arginine-to-lysine (RtoK) mutant ofthe Ddx4 IDR. In contrast, an interaction scheme based on contact statistics among foldedglobular protein structures reproduces the overall experimental trend, including that theRtoK mutant has a much diminished LLPS propensity. This finding underscores the impor-tant role of π -related interactions in LLPS and that their effects are embodied to a degreein classical statistical potentials. Protein-protein electrostatic interactions are modulatedby relative permittivity, which in general depends on protein concentration in the aqueousmedium. Analytical theory suggests that this dependence entails enhanced inter-proteininteractions in the condensed phase but more favorable protein-solvent interactions in thedilute phase. The opposing trends lead to only a modest overall impact on LLPS. INTRODUCTION
A preponderance of recent advances demonstrate that liquid-liquid phase separation(LLPS) of intrinsically disordered proteins (IDPs), proteins containing intrinsically disor-dered regions (IDRs), folded proteins, and nucleic acids is a general biophysical mechanismto achieve functional spatial and temporal organization of biomolecules in both intra- andextra-cellular organismal space.
LLPS underpins formation of a variety of biomolecularcondensates, including intracellular bodies, such as nucleoli and stress granules, that areoften referred to as membraneless organelles, and precursor of extracellular materials asin the case of sandcastle worm adhesive and elastin in vertebrate tissues . These dynamic,phase-separated condensates perform versatile functions, as underscored by their recentlyelucidated roles in synapse formation and plasticity, organization of chromatin, regu-lation of translation, B cell response, and autophagosome formation. The pace ofdiscovery in this very active area of research has been accelerating.
While experimental progress has been tremendous, theory for the physico-chemical ba-sis of biomolecular condensates is still in its infancy. Biomolecular condensates in vivoare complex, involving many species of proteins and nucleic acids maintained often bynon-equilibrium processes, rendering atomistic modeling impractical. Facing thischallenge, promising initial theoretical steps using coarse-grained approaches were madeto tackle simpler in-vitro LLPS systems, as their elucidation is a prerequisite for physi-cal insights into more complex in vivo condensates. These recent efforts encompass an-alytical theories at various levels of approximation, field theory simulations, andlattice or continuum coarse-grained explicit-chain simulations that account foreither individual amino acid residues or, at lower structural resolution, groups ofresidues —including using patchy particle representations.
The different theoreti-cal/computational approaches are complementary, and were applied to address how aminoacid composition (number/fraction of hydrophobic, aromatic, or charged residues)and the sequence pattern of charge, hydrophobic, or aromatic residues affectLLPS propensity of heteropolymers as well as pertinent impact of temperature, hy-drostatic pressure, salt, and osmolyte, offering physical insights into the LLPSbehaviors of, for example, the DEAD-box RNA helicase Ddx4, RNA-binding proteinfused in sarcoma (FUS), prion-like domains, and postsynaptic densities. Developing LLPS models with transferable interaction potentials applicable to a widerange of amino acid sequences is essential for advancing fundamental physical understand-ing of natural biomolecular condensates and engineering of bio-inspired materials. In thisendeavor, the rapidly expanding repertoire of experimental data offers critical assessmentof theoretical and computational approaches. Building on aforementioned progress, the present study evaluates a variety of interaction schemes for coarse-grained residue-basedchain simulations of LLPS of intrinsically disordered proteins (IDPs) or regions (IDRs), in-cluding but not limited to schemes proposed in the literature. We do so by comparing theirsequence-specific predictions against experimental data on the RNA helicase Ddx4 for whichextensive experimental data on the wildtype (WT) and three mutant sequences are avaial-able to probe the contribution of hydrophobic, electrostatic, cation- π , and possibly other π -related interactions to LLPS. We use these data to benchmark the relative strengthsof different types of interaction in our model. Of particular interest are the aromatic andother π -related interactions, which have significant impact on folded protein structure,conformational distribution of IDPs and LLPS properties, but are often notadequately accounted for in model potentials. Interestingly, a simple statistical potentialbased upon folded protein structures consistently accounts for the LLPS properties of allfour Ddx4 IDR sequences, but a model potential that rely solely on hydrophobicity doesnot. This finding indicates that, at the coarse-grained level of residue-residue interactions,IDP/IDR LLPS is governed by the same forces—including the π -related ones—that driveprotein folding. Explicit-water simulation and new analytical theory suggest, at variancewith previous analyses, that the physically expected dependence of effective permittiv-ity on IDR concentration may have a modest instead of drastic impact on LLPS propensitybecause of a tradeoff between solvent-mediated electrostatic interchain interactions andself-interactions. These findings and their ramifications are discussed below. RESULTS AND DISCUSSION
Our coarse-grained modeling setup follows largely the Langevin dynamics formulationsin Refs. 51, 52 for IDP LLPS. The simulation protocol features an initial slab-like condensedconfiguration that allows for efficient equilibration. Model energy functions embodyingdifferent physical perspectives are considered; details are in the Supporting Information.We critically assess the models by comparing their predictions against the experimentaldata on the Ddx4 IDR (Fig. S1), which indicate that all three Ddx4 IDR mutants—thecharge scrambled (CS), phenylalanine-to-alanine (FtoA), and arginine-to-lysine (RtoK)variants—have significantly reduced LLPS propensities relative to the WT.
The CS,FtoA, and RtoK variants are useful probes for LLPS energetics. They were constructedspecifically to study the experimental effects of sequence charge pattern (the arrangementof charges along the chain sequence of CS is less blocky than that in WT while theamino acid composition is unchanged), the relative importance of aromatic/ π -related vshydrophobic/nonpolar interactions (all 14 Phe residues in WT Ddx4 IDR are mutated toAla in FtoA), and the role of Arg vs Lys (all 24 Arg residues in WT IDR are mutated toLys in RtoK) on the LLPS of Ddx4. (a) (b) Fig. 1.
Comparing two amino acid residue-based coarse-grained potentials. (a) Scatter plot of 210 pairwisecontact energies (in units of kcal mol − ) in the HPS (horizontal variable) versus those in the KH (verticalvariable) model. E ij ( r )s are the pairwise potential energies U aa | HPS ( r ) or U aa | KH ( r ) (see SupportingInformation), between two residues of types i, j separated by r ij = r where the Lennard-Jones componentof the potential is minimum (here i, j stand for labels for the 20 amino acid types). Energies of contactsinvolving Arg (red), Lys (green) and Phe (yellow) are colored differently from others (blue). (b) Contactenergies between residue pairs at positions i, j of the n = 236 sequence of WT Ddx4 IDR (Ddx4 N1 , Ref. 4)in the two potentials are color coded by the scales. The vertical and horizontal axes represent residue posi-tions i, j ≤ n . The i (cid:54) = j contact energies in the HPS and KH models are provided in the two-dimensionalplot, whereas the i = j contact energies are shown alongside the model potentials’ respective color scales. Assessing Biophysical Perspectives Embodied by Different Coarse-GrainedInteraction Schemes For Modeling Biomolecular Condensates.
We consider thepotential functions in the hydrophobicity-scale (HPS) and the Kim-Hummer (KH) modelsin Dignon et al. as well as the HPS potential with augmented cation- π terms, all of whichshare the same bond energy term, U bond , for chain connectivity and screened electrostaticterm, U el , for pairs of charged residues, as described in the Supporting Information. We focusfirst on the pairwise contact interactions between amino acid residues, which correspond tothe U aa energies of either the HPS or KH model (excluding U bond and U el ).The HPS model assumes that the dominant driving force for IDP LLPS is hydrophobicityas characterized by a scale for the 20 amino acid residues. Pairwise contact energy is takento be the sum of the hydrophobicities of the two individual residues of the pair. The HPSmodel adopts the scale of Kapcha and Rossky, in which the hydrophobicity of a residue isa composite quantity based on a binary hydrophobicity scale of the atoms in the residue. In contrast, the KH model relies on knowledge-based potentials derived from contactstatistics of folded protein structures in the Protein Data Bank (PDB). As such, it assumesthat the driving forces for IDP LLPS are essentially identical to those for protein folding ata coarse-grained residue-by-residue level, as obtained by Miyazawa and Jernigan, withoutsingling out a priori a particular interaction type as being dominant. (a) (b) (c) Distance(Å) - l n (r e l a ti v e fr e qu e n c y ) Fig. 2.
Possible cation- π interaction potentials. (a) Sum of the coarse-grained HPS potential and amodel cation- π interaction with a uniform ( (cid:15) c π ) ij = 3 . − as a function of residue-residue distancefor the residue pairs Arg-Tyr, Arg-Phe, Arg-Trp, Lys-Tyr, Lys-Phe and Lys-Trp, wherein Tyr/Phe/Trpare labeled as red/green/blue and Arg/Lys are represented by solid/dashed curves. (b) An aternatecation- π potential in which Arg-Tyr/Phe/Trp is significantly more favorable (solid curve, ( (cid:15) c π ) ij = 1 . − ) than Lys-Tyr/Phe/Trp (dashed curve, ( (cid:15) c π ) ij = 0 .
65 kcal mol − ). Note that the plotted curveshere—unlike those in (a)—do not contain the HPS potential. (c) Normalized C α –C α distance-dependentcontact frequencies for the aforementioned six cation- π pairs (color coded as in (a)) computed using a setof 6,943 high-resolution X-ray protein crystal structures (resolution ≤ . Contact pair statistics are collected from residues on different chains in a givenstructure and residues separated by ≥
50 amino acids along the same chain. C α –C α distances are dividedinto 0.2 ˚A bins. For each bin, the relative frequency is the number of instances of a cation- π -like contact(defined below) divided by the total number of residue pairs with C α –C α distances within the narrowrange of the bin. Thus, the shown curves quantify the tendency for a given pair of residues to engagein cation- π interaction provided that the pair is spatially separated by a given C α –C α distance. Here acation- π -like contact is recognized if either a Lys NZ or an Arg NH1 nitrogen atom is within 3.0 ˚A of anyone of the points 1.7 ˚A above or below a sp carbon atom along the normal of the aromatic ring in a Tyr,Phe, or Trp residue. This criterion is exemplified by the molecular drawing (inset) of a contact betweenan Arg (top) and a Phe (bottom). Colors of the chemical bonds indicate types of atom involved, withcarbon in black, oxygen in red, and nitrogen in blue. The red dots here are points on the exterior surfacesof the electronic orbitals farthest from the sp carbons in the aromatic ring. The blue, green, and red linesemanating from a corner of the aromatic ring constitute a local coordinate frame, with the blue line beingthe normal vector of the plane of the aromatic ring determined from the positions of its first three atoms.The yellow lines mark spatial separations used to define the cation- π -like contact. The HPS model has been applied successfully to study the FUS low-complexity-domain, the RNA-binding protein TDP-43, and the LAF-1 RGG domain as well as its chargeshuffled variants. A temperature-dependent version of HPS (HSP-T) was also able torationalize the experimental LLPS properties of artificial designed sequences. When boththe HPS and KH models were applied to FUS and LAF-1, the predicted phase diagramswere qualitatively similar for a given sequence though they exhibited significantly differ-ent critical temperatures, which should be attributable to the difference in effective en-ergy/temperature scale of the two models. Here we conduct a systematic assessment ofthe two models’ underlying biophysical assumptions by assessing their ability to provide aconsistent rationalization of the LLPS properties of a set of IDR sequences.The scatter plot in Fig. 1a of HPS and KH energies indicates that, despite an overallcorrelation, there are significant outliers. The most conspicuous outliers are interactionsinvolving Arg (red), which are much less favorable in HPS than in KH. By comparison,most of the interactions involving Pro, as depicted by the 16 outlying blue circles as well asthe single yellow and single green circles to the left of the main trend, are considerably morefavorable in HPS than in KH. Interactions involving Phe (yellow) and Lys (green) essentiallyfollow the main trend. Those involving Phe are favorable to various degrees in both models.However, some interactions involving Lys are attractive in HPS but repulsive in KH. Forexample, Lys-Lys interaction is attractive at ≈ − . − for HPS but is repulsiveat ≈ +0 . − for KH. Figure 1b underscores the difference in interaction patternof the two models for the WT Ddx4 IDR. The KH pattern is clearly more heterogeneouswith both attractions and repulsions, whereas the HPS pattern is more uniform with norepulsive interactions. These differences should lead to significantly different predictions insequence-dependent LLPS properties, as will be explored below.Because of the importance of cation- π interactions in protein folded structure as wellas conformational distribution of IDP and LLPS, we study another set of modelinteraction schemes—in addition to HPS and KH, referred to as HPS+cation- π —that aug-ment the HPS potentials with terms specific for cation- π interactions between Arg or Lysand the aromatic Tyr, Phe, or Trp (Fig. 2). As explained in the Supporting Information, weconsider two alternate scenarios: (i) the cation- π interaction strength is essentially uniform,irrespective of the cation-aromatic pair (Fig. 2a), as suggested by an earlier analysis; and(ii) the cation- π interaction strength is significantly stronger for Arg than for Lys (Fig. 2b).The latter scheme is motivated by recent experiments showing that Arg to Lys substitutionsreduce LLPS propensity, as in the cases of the RtoK mutant of Ddx4 IDR and variantsof FUS, as well as a recent theoretical investigation pointing to different roles of Argand Lys in cation- π interactions. Contact statistics of PDB structures, including thoseof Miyazawa and Jernigan on which the KH potential is based, may also suggest thatArg- π attractions are stronger than Lys- π ’s. Indeed, among a set of 6,943 high-resolutionX-ray protein structures, we find that an Arg-aromatic pair is at least 75% more likelythan a Lys-aromatic pair to be within a C α –C α distance of ≤ . On top of that, given an Arg-aromatic and a Lys-aromatic pair are separated by thesame C α –C α distance (Fig. 2c), the Arg-aromatic pair (solid curves) are more likely thanthe Lys-aromatic pair (dashed curves) to adopt configurations consistent with a cation- π interaction. However, we should also emphasize that although a significantly strongerArg- than Lys-associated cation- π interaction is explored here as an alternate scenario, itis probable, as argued by Gallivan and Dougherty using a comparison between Lys-likeammonium-benzene and Arg-like guanidinium-benzene interactions, that the strengthsof the “pure” cation- π parts of Arg- and Lys-aromatic interactions are similar despitethe relative abundance of Arg-aromatic contacts due to other factors such as π - π effects. Density(mg/ml) Density(mg/ml) Density(mg/ml)Density(mg/ml) M od e l T e m p e r a t u r e WTCSFtoARtoK
RtoKFtoACSWT (b)(a)
Fig. 3.
Simulated phase behaviors of Ddx4 IDR variants in a hydrophobicity-dominant potentialaugmented by cation- π interactions. (a) Sequence patterns of the wildtype (WT) and its charge-scrambled(CS), Phe to Ala (FtoA) and Arg to Lys (RtoK) variants. Select residue types are highlighted: Ala(orange), Asp and Glu (red), Phe (magenta), Lys (cyan), and Arg (dark blue); other residue types are notdistinguished. (b) Simulated phase diagrams of WT, CS, FtoA and RtoK Ddx4 IDR at various relativepermittivities ( (cid:15) r ) as indicated, using the HPS model only (leftmost panel) and the HPS model augmentedwith cation- π interactions (other panels on the right) with either a uniform ( (cid:15) c π ) ij as described in Fig. 2a(top) or with different ( (cid:15) c π ) ij values for Arg and Lys as given in Fig. 2b (bottom). Hydrophobicity, Electrostatics And Cation- π Interactions Are Insufficient ByThemselves To Rationalize Ddx4 LLPS Data In Their Entirety.
We begin our as-sessment of models by applying the HPS and HPS+cation- π potentials to simulate thephase diagrams of the four Ddx4 IDRs (Fig. S1), the sequence patterns of which are illus-trated in Fig. 3a. Consistent with experiments, the simulated phase diagrams (Fig. 3b)exhibit upper critical solution temperatures (a maximum temperature above which phaseseparation does not occur). We emphasize, however, that although the simulated criti-cal temperatures are assuringly in the same range as those deduced experimentally, themodel temperature (in K) of our simulated phase diagrams in Figs. 3b and 4 should not becompared directly with experimental temperature. This is because not only of uncertain-ties about the overall model energy scale but also because the models do not account forthe temperature dependence of effective residue-residue interactions. For simplicity,our models include only temperature-independent energies as they are purposed mainly forcomparing the LLPS propensities of different amino acid sequences on the same footingrather than for highly accurate prediction of LLPS behaviors of any individual sequence.The leftmost panel of Fig. 3b provides the HPS phase diagrams computed using relativepermittivity (cid:15) r = 80 (corresponding approximately to that of bulk water, as in Ref. 51).They show that the predicted behaviors of the CS and FtoA variants are consistent withexperiments—that their LLPS propensities are reduced relative to WT; but the predictedenhanced LLPS propensity of RtoK is opposite to the experimental finding of Vernon etal. that the LLPS propensity of RtoK is lower than that of WT. This shortcoming of theHPS model is a consequence of its assignment of much less favorable interactions for Argthan for Lys, as noted in Fig. 1a.The other panels of Fig. 3b provide the HPS+cation- π phase diagrams. They are com-puted for (cid:15) r = 80, 40, and 20 to gauge the effect of electrostatic interactions relative to othertypes of interactions. The (cid:15) r -dependent results serve also as a preparatory step for our subse-quent investigation of the impact of incorporating the physical effect of IDR-concentration-dependent permittivity on predicted LLPS properties. None of the HPS+cation- π phasediagrams is capable of avoiding mismatch with experiment as they all predict a higherLLPS propensity for the RtoK variant than for WT. Apparently, the bias of the HPS po-tential against Arg interactions is so strong that it cannot be overcome by additional Arg-aromatic interactions that are reasonably more favorable than Lys-aromatic interactions(Fig. 2b). The (cid:15) r = 80 results for both uniform and variable cation- π strength exhibit anadditional mismatch: Contrary to experiments, they predict similar LLPS propensitiesfor the CS variant and WT, suggesting that under this dielectric condition, electrostaticinteractions are unphysically overwhelmed by the presumed cation- π interactions. The (cid:15) r = 20 results for variable cation- π also indicate an additional mismatch, in this case theyfail to reproduce the experimental trend of a significantly lower LLPS propensity of theFtoA variant relative to that of WT, probably because the relatively weak cation- π con-tribution is overwhelmed by strong electrostatic interactions in this low- (cid:15) r situation. Takentogether, although a perspective involving only electrostatic and cation- π interactions wasadequate to account for sequence-specific LLPS trend of WT and CS (and possibly alsoFtoA) before the RtoK experiment was performed, such a perspective is incomplete whenRtoK enters the picture. Fig. 3b shows that the HPS+cation- π model, which takes into ac-count hydrophobic, charged, and cation- π interactions, cannot account for the general trendof available Ddx4 LLPS data. It follows that these interactions—at least when hydropho-bicity is accorded by the particular scale adopted by HPS—are insufficient by themselvesto account for LLPS of IDRs in general.0 M od e l T e m p e r a t u r e WTCSFtoARtoK
Fig. 4.
Simulated phase behaviors of Ddx4 IDR variants using an interaction scheme based largely onPDB-derived statistical potentials. Phase diagrams were computed using the KH model at three differentrelative permittivities ( (cid:15) r ). Structure-Based Statistical Potentials Provide An Approximate AccountOf π -Related And Other Driving Forces for LLPS. In contrast to the HPS andHPS+cation- π models, direct application of the KH model—without augmented cation- π terms—leads to an overal trend that is consistent with experiments for the (cid:15) r val-ues tested, i.e., all three Ddx4 IDR variants are predicted by the KH potential to havelower LLPS propensities than WT (Fig. 4). Illustrations of phase-separated and non-phase-separated configurations are provided in Fig. 5. Previous computation of time-dependentmean-square deviation of molecular coordinates have been used to verify liquid-like chaindynamics in the condensed phase of HPS and KH models. Examples of similar calculationare provided in Fig. S3 and Fig. S4 for the Ddx4 IDRs examined in the present study.This success of the KH model suggests that empirical, knowledge-based statistical po-tentials derived from the PDB capture key effects governing both protein folding and IDRLLPS without prejudging the dominance of, or lack thereof, particular types of energeticssuch as hydrophobicity in the HPS model. In this respect, it would not be surprising thatcation- π and other π -related interactions are reflected in these knowledge-based potentialsas well. After all, the importance of cation- π interactions in folded protein structure and π - π interactions in IDR LLPS is recognized largely by bioinformatics analyses of the PDB.As discussed above, a major cause of the HPS model’s shortcoming in accounting forthe LLPS of Ddx4 IDRs (Fig. 3b) is the high degree of unfavorability it ascribes to Arginteractions. Its hydrophobicity scale is based on the atomic partial charges in the OPLSforcefield. In that formulation, Arg is least hydrophobic with a hydrophobicity value of+14 .
5, the next-least hydrophobic is Asp with +7 .
5, whereas Lys has +5 .
0, and the mosthydrophobic is Phe with − . i , in the pairwise energy E ij ( r ) (Fig. 1a) is Arg, the average of E ij ( r ) over j for all amino acids except the charged residues Arg, Lys, Asp, and Glu is equal to − . − , whereas the corresponding average for Lys is much more favorable at − . − . − .
119 for Lys. In contrast, for the KH model, the trend is1opposite with Arg-associated interactions being much more favorable: the correspondingaverage is − .
123 for Arg and − .
041 for Lys when charged residues are excluded in theaveraging and − . − . (whichunderlie the KH potential) indicating that Arg has a significantly larger projection thanLys along the dominant eigenvector. (a)(b)(e)(d)(f) A C D E F G H I K L M N P Q R S T V W Y (c)
D,E R K F A Others
Fig. 5.
Illustrative snapshots of Ddx4 N1 CS phase behaviors simulated using the KH potential for (cid:15) r = 40.(a) A non-phase-separated snapshot at model temperature 375 K, wherein the amino acid residues arecolored using the default VMD scheme as provided by the key below the snapshot. (b) Same as(a) except the color scheme (as shown) is essentially identical to that in Fig. 3a. (c) Same as (a) and (b)except all residues along the same chain share the same color. Neighboring chains are colored differentlyto highlight the diversity of conformations in the system. (d–f) A phase-separated snapshot at modeltemperature 325 K. The color schemes are the same, respectively, as those in (a–c). Whereas correlation among hydrophobicity scales inferred from different methods islimited with significant variations especially for the nonhydrophobic polar and chargedresidues, the extremely low hydrophobicity assigned by HPS to Arg relative to Lys isunusual. For instance, Lys is substantially less hydrophobic than Arg in two of the threescales tabulated and compared in Ref. 85. In a commonly-utilized scale based on the freeenergies of transfer of amino acid derivatives from water to octanol measured by Fauch`ereand Pliˇska (the third scale tabulated in Ref. 85), Arg is only slightly less hydrophobic2(+5 .
72 kJ mol − ) than Lys (+5 .
61 kJ mol − ) and thus, essentially, Arg and Lys are deemedto possess equally low hydrophobicities. Accordingly, this scale affords a better correlationwith the Miyazawa-Jernigan energies (Fig. 3b of Ref. 85) than that exhibited in Fig. 1a.It is reasonable to expect the 210 (or more) residue-residue contact energy parametersin PDB-structures-based potentials to contain more comprehensive energetic informationthan merely the hydrophobicities of the 20 types of amino acid residues. In this regard, it isnotable that a higher propensity for Arg than Lys to engage favorably with another residueappears to be a robust feature of PDB statistics. It holds for the cation-aromatic pairswe analyze in Fig. S2, for the KH potential, and also for the original Miyazawa-Jerniganenergies put forth in 1985. According to Table V of Ref. 72, the contact energies e ij between Arg and aromatic or negatively charged residues are − . − . − . − . − . k B T , respectively, for Arg-Phe, Arg-Trp, Arg-Tyr, Arg-Glu, and Arg-Asp ( k B is Boltzmann constant, T is absolute temperature), whereas the corresponding contactenergies are weaker for Lys at − . − . − . − .
60, and − . k B T , respectively, forLys-Phe, Lys-Trp, Lys-Tyr, Lys-Glu, and Lys-Asp. All twenty Arg interactions are morefavorable than the corresponding Lys interactions. The average e ij over all Arg-associatedpairs is − . k B T , which is substantially more favorable than the corresponding averageof − . k B T for the Lys-associated pairs. It is apparent from the present application ofKH to the Ddx4 IDRs that this feature is crucial, at least at a coarse-grained level, for anadequate accounting of the π -related energetics of biomolecular LLPS. IDR Concentration Can Significantly Affects The Dielectric Environment OfCondensed Droplets But Its Impact On LLPS Propensity Can Be Modest.
Inrecent and the above coarse-grained, implicit-solvent simulations of LLPS of IDRs,electrostatic interactions are assumed, for simplicity, to operate in a uniform dielectricmedium with a position-independent (cid:15) r . However, the dielectric environment is certainlynonuniform upon LLPS: The electrostatic interaction between two charges are affected toa larger extent by the intervening IDR materials in the condensed phase—where there isa higher IDR concentration—than in the dilute phase. Protein materials have lower (cid:15) r ’sthan bulk water. Analytical treatments with effective medium theories suggest that adecrease in effective (cid:15) r with increasing IDR concentration enhances polyampholytes LLPSin a cooperative manner because the formation of condensed phase lowers (cid:15) r and that inturn induces stronger electrostatic attractions that favor the condensed phase. In principle, LLPS of IDR chains in polarizable aqueous media can be directly simulatedusing explicit-water atomic models wherein partial charges are assigned to appropriatesites of the water and protein molecules; but such simulations are computationallyextremely costly because a large number of IDR chains are needed to model LLPS.Here we use explicit-water atomic simulation involving only a few IDR molecules, notto model LLPS but to estimate how the effective (cid:15) r depends on IDR concentration.We will then combine this information with analytical formulations to provide a more3complete account of the electrostatic driving forces for LLPS. The dielectric propertiesof folded proteins, their solutions, and related biomolecular and cellular set-tings have long been of interest. For the current interest in biomolecular condensates,their interior dielectric environments are expected to be of functional significance, e.g., asdrivers for various ions and charged molecules to preferentially partition into a condensate. M o d e l T e m p e r a t u r e Density (mg/ml) Density (mg/ml)(a) (b) With "Self-Energy" (c) No "Self-Energy"
SPC/E 100mM NaClTIP3P 100mM NaClTIP3P WT Salt-freeTIP3P No charge Salt-free
Fig. 6.
Effects of IDR-concentration-dependent relative permittivity on phase behaviors. (a) Relativepermittivity (cid:15) r ( φ ) values obtained by atomic simulations (symbols) using various explicit-water models (asindicated, bottom) are shown as functions of Ddx4 volume fraction φ ( φ = 1 corresponds to pure Ddx4).The blue curve is a theoretical fit of the SPC/E, [NaCl] = 100 mM explicit-water simulated data basedon the Slab (Bragg and Pippard ) model [Eq. (34) of Ref. 37], viz., 1 /(cid:15) r ( φ ) = φ/(cid:15) p + (1 − φ ) /(cid:15) w withthe fitted (cid:15) p = 18 . (cid:15) w = 84 . (cid:15) p and (cid:15) w are, respectively, the relative permittivity of pureprotein and pure water. The black solid, dashed, and dashed-dotted lines are approximate linear modelsof (cid:15) r ( φ ) = (cid:15) p φ + (cid:15) w (1 − φ ) with the same (cid:15) w but different (cid:15) p values as indicated (top-right), resulting in d(cid:15) r ( φ ) /dφ slopes, respectively, of − . − .
9, and − .
2. (b, c) Theoretical phase diagrams of the fourDdx4 IDR variants were obtained by a RPA theory that incorporates an (cid:15) r ( φ ) linear in φ . Solid, dashed,and dashed-dotted curves correspond, as in (a), to the three different (cid:15) p values used in the theory. Theelectrostatic contribution to the phase behaviors is calculated here using either (b) the expression for f el given by Eq. (S51) in Supporting Information [i.e., Eq. (68) of Ref. 35 with its self-interaction term G (˜ k ) excluded] or (c) the full expression for f el [Eq. (68) in the same reference, or equivalently Eq. (S2)].Further details are provided in Supporting Information. The simulations are conducted on five WT Ddx4 IDRs using GROMACS and theamber99sb-ildn forcefield with TIP3P or SPC/E waters in boxes of different sizes forsix IDR concentrations. Relative permittivities are estimated by fluctuations of the systemdipole moment. Simulations are also performed on artificially constructed Ddx4 (aDdx4)in which the sidechain charges of Arg, Lys, Asp, and Glu are neutralized for possibleapplications when sidechain charges are treated separately from that of the backgrounddielectric medium. Methodological details are provided in the Supporting Information.4Some of the simulated (cid:15) r values are plotted in Fig. 6a to illustrate their dependence onIDR volume fraction φ (the φ ∝ concentration relation and an extended set of simulated (cid:15) r ’s are provided by Fig. S5, Table S1 and the text in the Supporting Information). Thedifference in simulated (cid:15) r ( φ ) for Ddx4 and aDdx4 is negligible except at very low IDRconcentration (Fig. 6a and Fig. S5), likely because the main contribution to the dielectriceffect of IDR in the atomic model is from the partial charges on the protein backbone.Consistent with expectation, simulated (cid:15) r ( φ ) in Fig. 6a decreases with increasing φ forall solvent conditions considered. Permittivity is known to decrease with salt. Herethis expected effect is observed for TIP3P solution of IDR at low but not at high IDRconcentration. Interestingly, the (cid:15) r ( φ ) simulated with SPC/E water and 100 mM NaClexhibits nonlinear decrease with increasing φ , which is akin to that predicted by the Bragg-Pippard and Clausius-Mossotti models; but the TIP3P-simulated (cid:15) r ( φ ) appears to belinear in φ , which is more in line with the Maxwell Garnett and Bruggeman models. We utilize the salient features of the coarse-grained KH chain model for Ddx4 (Fig. 4) andthe IDR-concentration-dependent permittivities from explicit-water simulations (Fig. 6a)to inform an analytical theory for IDR LLPS, referred to as RPA+FH, that combines arandom-phase-approximation (RPA) of charge-sequence-specific electrostatics and Flory-Huggins (FH) mean-field treatment for the other interactions.
An in-depth analysis ofour previous RPA formulation for IDR-concentration-dependent (cid:15) r (Ref. 35) indicates thatonly an (cid:15) r ( φ ) linear in φ can be consistently treated by RPA (Supporting Information).With this in mind, and considering the uncertainties of simulated (cid:15) r ( φ ) for different watermodels (Fig. 6a), three alternative linear forms of (cid:15) r ( φ ) (straight lines in Fig. 6a) are usedfor the present RPA formulation to cover reasonable variations in (cid:15) r ( φ ).The mean-field FH interaction parameters in our RPA+FH models for the four Ddx4IDRs are obtained from the four sequences’ average pairwise non-electrostatic KH contactenergies. For each of the 236-residue sequences, we calculate the average of the E ij ( r )[KH] quantity (Fig. 1a), for a given pair of residue types, over all pairs of residues on thesequence, including a residue with itself (236 × / − , − . − . − . φ -independent (cid:15) r forthree different fixed (cid:15) r = 80, 40, and 20. The computed RPA+FH phase diagrams are thenfitted to the corresponding phase diagrams simulated by coarse-grained KH chain modelsin Fig. 4 to determine a single energy scaling factor from the best possible fit (Fig. S6).The product of this factor and the sequence-dependent averages of E ij ( r ) [KH] definedabove is now used as the enthalpic FH χ parameters in the final RPA+FH theories withIDR-concentration-dependent (cid:15) r ( φ ). Details of unit conversion between our explicit-chainsimulation and our analytical RPA+FH formulation are in the Supporting Information.5In this connection, it is instructive to note that the corresponding averages of E ij ( r )[HPS] for the HPS model are − . − . − . (cid:15) r ( φ ) functions andKH-derived mean-field FH parameters as prescribed above. In all cases considered, theWT sequence (red curves) exhibit a higher propensity to LLPS than the three variants,indicating that this general agreement with the experimental trend seen in Fig. 4 holds upnot only under the simplifying assumption of a constant (cid:15) r but also when the dielectriceffect of the IDRs is taken into account. As discussed in the Supporting Information, wehave previously subtracted the self-energy term in the RPA formulation for numericalefficiency because the term has no impact on the predicted phase diagram when (cid:15) r isa constant independent of φ because the self-energy contribution is identical for thedilute and condensed phases. However, with an IDR-concentration-dependent (cid:15) r ( φ ), asfor the cases considered here, the self energy—with the short-distance cutoff of Coulombinteraction in the RPA formulation corresponding roughly to a finite Born radius —isphysically relevant as it decreases with increasing (cid:15) r , and therefore it affects the predictedLLPS properties as manifested by the difference between Fig. 6b and 6c. It followsthat the self-energy term quantifies a tendency for an individual polyampholyte chainto prefer the dilute phase with a higher (cid:15) r —because of its more favorable electrostaticinteractions with the more polarizable environment—over the condensed phase with alower (cid:15) r . This tendency disfavors LLPS. At the same time, the lower (cid:15) r in the condensedphase entails a stronger inter-chain attractive electrostatic force that drives the associationof polyampholyte chains. Therefore, taken together, relative to the assumption of aconstant (cid:15) r , the overall impact of an IDR-concentration-dependent (cid:15) r ( φ ) is expected tobe modest because it likely entails a partial tradeoff between these two opposing effects.This consideration is borne out in Fig. 6b and c. When self energy is neglected in Fig. 6c,LLPS propensities predicted using IDR-concentration-dependent (cid:15) r ( φ )’s are relatively high(as characterized by the critical temperatures), comparable to those for a fixed (cid:15) r = 40in Fig. 4b. When the physical effect of self energy is accounted for in Fig. 6b, LLPSpropensities predicted using IDR-concentration-dependent (cid:15) r ( φ )’s are significantly lower:overall they are comparable but slightly lower than those for a fixed (cid:15) r = 80 in Fig. 4a.Consistent with this physical picture, whereas the (cid:15) r ( φ ) with a sharper decrease withincreasing φ leads to a higher LLPS propensity when self energy is neglected (dashedcurves have higher critical temperatures than dashed-dotted curves of the same color inFig. 6c), for the physically appropriate formulation that takes self energy into account, asharper decrease in (cid:15) r ( φ ) with increasing φ is associated with a lower LLPS propensity(dashed curves have lower critical temperatures than dashed-dotted curves of the samecolor in Fig. 6b).6 CONCLUSION
In summary, we have gained new insights into the physical forces that drive the forma-tion of biomolecular condensates by systematically evaluating coarse-grained, residue-basedprotein chain models embodying different outlooks as to the types of interactions that areimportant for LLPS of IDRs by comparing model predictions against experimental dataon WT Ddx4 IDR and its three variants. By requiring a model potential to account forall observed relative LLPS propensities of the four sequences, we find that hydrophobicity,electrostatic, and cation- π interactions are insufficient by themselves. Consistent ratio-nalization of the experimental data entails significantly more favorable arginine-associatedover lysine-associated contacts, an effect that is most likely underpinned by π - π interactions.This perspective is in line with bioinformatics analysis of LLPS propensities and recentexperiments on other IDRs. And it is reassuring that the balance of forces for LLPSof IDRs appears to be approximately captured by common PDB-derived statistical poten-tials developed to study protein folding and binding. We have also highlighted the reducedelectric permittivity inside condensed IDR phases. Although this effect’s overall influenceon LLPS propensity may be modest because of a tradeoff between its consequences on IDRself energies and on inter-IDR interactions, the effect of IDR-concentration-dependent per-mittivity by itself should be of functional importance in biology because of its potentialimpact on biochemical reactions and preferential partition of certain molecules into a givenbiomolecular condensate. All told, the present study serves not only to clarify the afore-mentioned issues of general principles, it also represents a useful step toward a transferablecoarse-grained model for sequence-specific biomolecular LLPS. Many questions remain tobe further investigated in this regard. These include—and are not limited to—a properbalance between attractive and repulsive interactions, devising temperature-dependenteffective interactions, an accurate account of small ion effects, and incorporationof nucleic acids into LLPS simulations. Progress in these directions will deepen our un-derstanding of fundamental molecular biological processes and will advance the design ofnovel IDR-like materials as well.
Acknowledgments
This work was supported by Canadian Institutes of Health Research grants MOP-84281and NJT-155930 to H.S.C., Natural Sciences and Engineering Research Council of CanadaDiscovery grants RGPIN-2016-06718 to J.D.F.-K. and RGPIN-2018-04351 to H.S.C., andand computational resources provided generously by Compute/Calcul Canada.7
Supporting Information for
Comparative Roles of Charge, π , and HydrophobicInteractions in Sequence-Dependent PhaseSeparation of Intrinsically Disordered Proteins Suman D AS , Yi-Hsuan L IN , , Robert M. V
ERNON , Julie D. F
ORMAN -K AY , and Hue Sun C
HAN , ∗ Department of Biochemistry, University of Toronto, Toronto, Ontario M5S 1A8, Canada; Molecular Medicine, Hospital for Sick Children, Toronto, Ontario M5G 0A4, Canada ∗ Corresponding authorE-mail: [email protected]; Tel: (416)978-2697; Fax: (416)978-8548Mailing address:Department of Biochemistry, University of Toronto, Medical Sciences Building – 5th Fl.,1 King’s College Circle, Toronto, Ontario M5S 1A8, Canada.8
Models and Methods C OARSE- G RAINED C HAIN M ODELS
The coarse-grained protein chain models in the present study basically follow those inRefs. 51, 52, but with modified and additional features. In accordance with our previousnotation for explicit-chain simulation studies, let µ, ν = 1 , , . . . , n be the labels for the n IDP chains in the system, and i, j = 1 , , . . . , N be the labels of the N residues in eachIDP chain. The total potential energy U T is a function of the residue positions, denotedhere as { R µi } . Writing U T = U bond + U el + U aa , (S1)where U bond is the bond-length term for chain connectivity: U bond = K bond n (cid:88) µ =1 N − (cid:88) i =1 ( r µi,µi +1 − l ) (S2)with r µi,νj ≡ | R µi − R νj | , l = 3 . α -C α virtual bond length [ l is equivalent to a in Eq. (3) of Ref. 52], K bond = 10 kJ mol − ˚A − [this value would be identical to that usedin Ref. 51 if the 10 kJ/˚A value quoted above Eq. (1) in this reference is a typographicalerror, i.e., it misses a “/mol”; by comparison, the much stiffer K bond value used in Eq. (3) inRef. 52, which follows Ref. 75 with the aim of comparing with fixed-bond-length Monte Carlosimulations, is equivalent to 23 . − ˚A − ], and U el is the electrostatic interaction: U el = n (cid:88) µ,ν =1 N (cid:88) i,j =1( µ,i ) (cid:54) =( ν,j ) σ µi σ νj e π(cid:15) (cid:15) r r µi,νj exp (cid:16) − κr µi,νj (cid:17) , (S3)wherein σ µi is the charge of the i th residue in units of elementary electronic charge e , ( σ µi isindependent of µ ), (cid:15) is vacuum permittivity, (cid:15) r is relative permittivity (dielectric constant),and κ is the reciprocal of the Debye screening length, which is taken to be 10 . κ = 0 . − ). Following Table S1 of Ref. 51, σ values for Arg and Lys are assignedto be +1, those of Asp and Glu are −
1, and that of His is +0 .
5. All other residues aretaken to be neutral, i.e., with σ = 0.The U aa in Eq. (S1) is the sum of pairwise interaction energies among the residues, viz., U aa = n (cid:88) µ,ν =1 N (cid:88) i,j =1( µ,i ) (cid:54) =( ν,j ) ( U aa ) µi,νj , (S4)where ( U aa ) µi,νj is the interaction between the i th residue of the µ th chain with the j thresidue of the ν th chain. We investigate several physically plausible U aa functions, as follows:9 The HPS model
The hydrophobicity scale (HPS) model is identical to the one introduced by Dignon etal. based on an atomic-level hydrophobicity scale devised by Kapcha and Rossky. Theinteraction between amino-acid pairs in this model is given by( U aa ) µi,νj = ( U aa | HPS ) µi,νj ≡ (cid:40) ( U LJ ) µi,νj + (1 − λ HPS ij ) (cid:15) , if r ≤ / a ij λ HPS ij ( U LJ ) µi,νj otherwise (S5)where λ HPS ij ≡ ( λ i + λ j ) / a ij ≡ ( a i + a j ) /
2, with λ i and a i being the hydrophobicity anddiameter, respectively, of the model amino acid residue at sequence position i , as given,respectively, by the λ and σ values in Table S1 of Ref. 51; ( U LJ ) µi,νj is the Lennard-Jones(LJ) potential, ( U LJ ) µi,νj = 4 (cid:15) ij (cid:34)(cid:18) a ij r µi,νj (cid:19) − (cid:18) a ij r µi,νj (cid:19) (cid:35) , (S6)where the LJ well depth (cid:15) ij (not to be confused with the permittivities) is set to be (cid:15) ij = 0 . − irrespective of i, j for the HPS model, as in Ref. 51. The HPS+cation- π models In view of the importance of cation- π interactions in protein structure (see discussion inthe main text), we consider also a class of model potentials, U aa | HPS+c π ’s, that augment theHPS potential with cation- π terms for Arg-Phe, Arg-Trp, Arg-Tyr, Lys-Phe, Lys-Trp, andLys-Tyr residue pairs. In these interaction schemes,( U aa ) µi,νj = ( U aa | HPS+c π ) µi,νj ≡ ( U aa | HPS ) µi,νj + ( U aa | c π ) µi,νj , (S7)where ( U aa | c π ) µi,νj = ( (cid:15) c π ) ij (cid:34)(cid:18) a ij r µi,νj (cid:19) − (cid:18) a ij r µi,νj (cid:19) (cid:35) , (S8)and ( (cid:15) c π ) ij is the cation- π interaction strength, ( (cid:15) c π ) ij > µi, νj is oneof the aforementioned six cation- π pairs, otherwise ( (cid:15) c π ) ij = 0. This simple form is adoptedfrom the cation- π term in Eq. (S1) of Ref. 70.Two sets of ( (cid:15) c π ) ij values are analyzed in the present study:(i) ( (cid:15) c π ) ij = 3 . − for all six cation- π pairs. The rationale for using a single( (cid:15) c π ) ij value is the suggestion by statistical and other inferences that the variations ofinteraction strengths among the six cation- π amino acid residue pairs could be relativelysmall, though subsequently we will also explore scenarios in which significant varia-tions in cation- π interaction strengths exist among the pairs. When combined with the0( U aa | HPS ) µi,νj contribution in Eq. (S7), ( (cid:15) c π ) ij = 3 . − leads to well depths for( U aa ) µi,νj = ( U aa | HPS+c π ) µi,νj of ≈ .
85 kcal mol − for Arg-Phe, Arg-Trp, Arg-Tyr, andcorresponding well depths of ≈ .
90 kcal mol − for Lys-Phe, Lys-Trp, and Lys-Tyr (seeFig. 2a of the main text). It should be noted here that we have chosen an ( (cid:15) c π ) ij valuesignificantly smaller than those used in Ref. 70 in order for the model cation- π interactionsto be more compatiable with the shallow well depths of the ( U aa | HPS ) µi,νj potentialsin the HPS model, which has a maximum well depth of 0 . − . Nonetheless,the ( (cid:15) c π ) ij = 3 . − value still entails a cation- π interaction strength which isabout double that of electrostatic interaction when (cid:15) r in Eq. (S3) corresponds to that ofbulk water ( (cid:15) r ≈ π and electrostatic in-teractions in an aqueous environment conforms to a similar ratio deduced computationally. (ii) Different ( (cid:15) c π ) ij values for cation- π pairs involving Arg and pairs involving Lys, with( (cid:15) c π ) ij = 1 .
85 kcal mol − for Arg-Phe, Arg-Trp, Arg-Tyr and ( (cid:15) c π ) ij = 0 .
65 kcal mol − for Lys-Phe, Lys-Trp, and Lys-Tyr. This alternate model cation- π interaction scheme ismotivated by observed trends of statistical potentials derived from PDB protein structuressuch as the Miyazawa-Jernigan energies used in the KH/MJ model (described below)and the new analysis presented in the main text as well as recent experimental evidence, all of which suggest that cation- π interactions involving Arg is more favorable than thoseinvolving Lys. The ( (cid:15) c π ) ij values in this scheme are chosen such that the combined welldepth of ( U aa | HPS+c π ) µi,νj for cation-aromatic pairs are comparable to the deepest welldepth of ≈ . − in the KH/MJ model. In particular, ( (cid:15) c π ) ij = 1 .
85 kcal mol − leads to a combined well depth of ≈ .
55 kcal mol − for terms in ( U aa | HPS+c π ) µi,νj involvingArg-aromatic pairs, whereas ( (cid:15) c π ) ij = 0 .
65 kcal mol − leads to a corresponding combinedwell depth of ≈ . − for Lys-aromatic pairs (Fig. 2b of the main text). The KH (KH/MJ) model
The Kim-Hummer/Miyazawa-Jernigan (KH/MJ) model corresponds to the KH modelused by Dignon et al., and is based on the statistical potentials of Miyazawa and Jernigan(MJ). Following Ref. 51, we refer to this model as KH in the main text and hereafter. Thebasic functional form of the KH potential, U aa | KH , is similar to that for the HPS potentialin Eq. (S5). For the KH model,( U aa ) µi,νj = ( U aa | KH ) µi,νj ≡ (cid:40) ( U LJ ) µi,νj + (1 − λ KH ij ) (cid:15) , if r ≤ / a ij λ KH ij ( U LJ ) µi,νj otherwise (S9)where ( U LJ ) µi,νj is given by Eq. (S6), but now (cid:15) ij depends on i, j . Specifically, for the KHmodel (cid:15) ij = | α ( e MJ ,ij − e ) | , (S10)where e MJ ,ij is the MJ statistical potential between the residue type at position i and the1residue type at position j , e is a constant shift of the energies, and λ KH ij = (cid:40) e MJ ,ij ≤ e − . (S11)We use α = 0 .
228 and e = − . − in the present study. The resulting pairwiseenergies e MJ correspond to the KH-D parameter set for IDRs in Table S3 of Ref. 51. Simulation method
Molecular (Langevin) dynamics simulations are carried out using the protocol outlinedin the “Simulation framework” section of Ref. 51, with parameters modified for thepresent applications. For each simulation, we consider 100 copies of one of the four Ddx4IDR sequences (Fig. S1), governed by one of the above coarse-grained model potentialfunctions. At the initial step, all the IDR chains are randomly placed in a relatively large,300 × ×
300 ˚A simulation box. Energy minimization is then applied to minimizeunfavorable steric clashes among the amino acid residues. Equilibrating NPT simulationis then performed for 50 ns at a temperature of 100 K and pressure of 1 bar, maintainedby Martyna-Tobias-Klein (MTK) thermostat and barostat with a coupling constantof 1 ps. It should be noted that the simulation pressure does not correspond to physicalpressure because solvent (water) pressure is not accounted for in the present coarse-grained,implicit-solvent model setup. In this regard, pressure is used entirely as an efficient com-putational device for achieving condensed configurations as starting point of subsequentsimulations. Throughout the dynamics simulation, equations of motion are integratedwith a timestep of 10 fs and periodic boundary conditions are applied to all three spatialdimensions. After the initial
NPT step, the simulation box is compressed again for 50ns along all three spatial dimensions at 100 K as successive
NVT ensembles ( P changesduring the process) using Langevin thermostat with friction coefficient 1 ps − . The extentof compression varies for different systems. Then the dimension along one of the threeCartesian axes of the simulation box is expanded 20 times relative to its initial value for aperiod of 50 ns while maintaing the temperature at 100 K. Equilibration NVT simulationis then performed at the chosen temperature for 2 µ s. Finally, production NVT runs arecarried out for 4 µ s and the chain configurations are saved every 0.5 ns for subsequentanalysis. During the production run, the friction coefficient of the Langevin thermostatis decreased to 0.01 ps − for sampling efficiency. All simulations are performed by theHOOMD-blue software package. After the snapshots of simulated chain configurationsare collected, the procedure for constructing phase diagrams from the configurations followsthat described in the “Simulation framework” section of Ref. 51 and the “Results anddiscussion” section of Ref. 52.2 E XPLICIT- W ATER S IMULATION OF
IDR-C
ONCENTRATION -D EPENDENT P ERMITTIVITY
Computational procedure
We estimate the IDR-concentration-dependent relative permittivity by atomisticexplicit-water molecular dynamic simulations performed at six Ddx4 IDR (wildtype, WT)concentrations using GROMACS, version 2016.5. The simulation proceeds as follows.Initially, a fully extended configuration of a Ddx4 IDR is prepared by PYMOL, to beused as input for Packmol to place five Ddx4 IDRs at random locations in a cubicsimulation box. The size of the box is varied to achieve different Ddx4 IDR concentrations.The Ddx4 IDRs are solvated by explicit water models in the simulation box. Each ofthe systems so constructed is then charge neutralized by adding appropriate number ofNa + ions. This is followed by energy minimization by steepest descent to minimize stericclashes. Hydrogen bonds are constrained with the LINCS algorithm. Equation of motionis integrated using a time step of 2 fs with the leap-frog integrator and cubic periodicboundary conditions. Long spatial-range electrostatic interaction is treated with particlemesh Ewald (PME) method with a grid spacing of 0.16 nm and an interpolation orderof 4. A cut-off of 1 nm is used for short-range van der Waals and electrostatic interactions.Initial equilibration is carried out for 2 ns under
N V T conditions at 300 K. Temperatureis maintained by Velocity-rescale thermostat with a time constant of 0.1 ps for allsimulations. This is followed by equilibration for 2 ns at 300 K under
N P T conditionsunder 1 atm pressure, which is maintained by a Berendsen barostat with a couplingconstant of 2 ps. Since the Berendsen barostat does not always yield an
N P T ensemblewith high accuracy, the resulting system is equilibrated again for 1 ns as an
N P T ensembleusing the Parrinello-Rahman barostat with the same coupling constant, after whichthe production
NPT run is carried out for 20 ns using the same Parrinello-Rahmanbarostat. Configurations are saved every 1.0 ps during the production run for subsequentanalysis. In addition to simulations of Ddx4 IDR in essentially pure water (except a fewNa + ions), we also conduct simulations with Na + and Cl − ions at [NaCl] = 100 mM.In order to enable a potentially more direct comparison with analytical theory that doesnot include the charges of amino acid residues in the estimation of effective permittivityof the aqueoue medium, we carry out another set of simulations with Ddx4 IDRconcentrations similar to the ones for which the above protocol is applied but with thecharges of the sidechains of the charged amino acids Arg, Lys, Asp, and Glu artificallyturned off. This set of simulation data is referred to as artificial Ddx4 or aDdx4. Thesame aforementioned procedure for equilibration and production is applied for this set ofsimulations. The amber99sb-ildn force field and the TIP3P water model are used forboth sets of simulations. To assess the robustness of the computed (cid:15) r values, all simulationsare also repeated using the SPC/E water model. Relative permittivity analysis
Static relative permittivity (cid:15) r (dielectric constant) is determined by the fluctuation ofthe total dipole moment vector, M T , of the system via the relation (cid:15) r = (cid:104) M (cid:105) − (cid:104) M T (cid:105) V (cid:15) k B T + 1 , (S12)where M T ≡ ( M T · M T ) / is the magnitude of the system dipole moment, (cid:104) . . . (cid:105) denotesaveraging over system configurations under equilibrium conditions, and V is the volume ofthe simulation box. This relation, Eq. (S12), has been used to compute the static dielectricconstant of several biological systems. Following the formulation in Ref. 91, M T is obtained as sum of dipole moments of individual water molecules and individual Ddx4IDR chain molecules. Irrespective of the net charge of the molecule (water has net charge 0whereas Ddx4 IDR has net charge ≈ − e ), the dipole moment, m , of a molecule comprisingof N m atoms with masses m s ( s = 1 , , . . . , N m ) at positions r s with point charges q s isgiven by m = (cid:80) N m s q s ( r s − r cm ), where r cm ≡ (cid:80) N m s m s r s / (cid:80) N m s m s is the center-of-massposition of the molecule. Accordingly, atomic ions, Na + ’s and Cl − ’s in our case, havezero dipole moment in this formulation. Once the dipole moments of the water and Ddx4molecules are determined in this manner, they are combined to yield M T which in turnprovides the system relative permittivity through Eq. (S12). Our computed (cid:15) r for variousconcentrations of Ddx4 IDR at different salt concentrations using both the TIP3P andSPC/E water models are given in Table S1.4 Random-Phase-Approximation (RPA) Theory of Phase Separationwith IDR-Concentration-Dependent Permittivity B ACKGROUND
Our group has previously considered, within our RPA theory of liquid-liquid phase sep-aration (LLPS), the effects of relative permittivity (cid:15) r being dependent upon local proteinconcentration; i.e., (cid:15) r = (cid:15) r ( φ m ) where φ m is polymer (IDR) volume fraction. An (cid:15) r ( φ m )necessitates changes to our earlier RPA expressions for electrostatic interaction for a con-stant, position-independent (cid:15) r , viz. [Eq. (33) of Ref. 35], f el = 12 (cid:90) d ( ka ) (2 π ) (cid:110) ln[det(1 + ˆ G k ˆ U k )] − Tr( ˆ ρ ˆ U k ) (cid:111) . (S13)Here, as in Ref. 35, a is unit volume, ˆ G k is the position correlation matrix, ˆ ρ is the densityoperator that provides the densities of various molecular species in the system (accountingfor matter, not electric charge), and ˆ U k accounts for sequence-dependent Coulumb interac-tions [the expression for ˆ U k is provided by Eq. (35) of Ref. 35]. For the simple illustrativecase here, which is a system of only IDR polymers without salt or counterions, ˆ G k reduces tothe monomer-monomer correlation ( ˆ G k ) ij = ( ρ m /N )( ˆ G M ( kl )) ij = ρ m exp[ − ( kl ) | i − j | / /N ,where ρ m is monomer density, l is the length of a polymer link (virtual bond length, denotedas b in Ref. 35), i, j = 1 , , . . . N are monomer labels along the polymer chain with N beingthe length of a chain, and ˆ ρ ˆ U k = ρ m ˆ U k /N [Eq. (4) of Ref. 34].When (cid:15) r = (cid:15) r ( φ m ), we applied the following modified version of f el [Eq. (68) of Ref. 35]: f el = (cid:90) d ˜ k ˜ k π (cid:26) η ln (cid:104) η G (˜ k ) (cid:105) − G (˜ k ) (cid:27) , (S14)where ˜ k = kb , η = ( b/a ) and, in the absence of salt and counterions, Eqs. (69a) and (69b)of Ref. 35 become G (˜ k ) = 4 π ˜ k [1 + ˜ k ] T ∗ (cid:15) r ( φ m ) (cid:18) φ m N (cid:104) σ | ˆ G M (˜ k ) | σ (cid:105) (cid:19) , (S15a) G (˜ k ) = 4 π ˜ k [1 + ˜ k ] T ∗ (cid:15) r ( φ m ) (cid:32) φ m N N (cid:88) i =1 | σ i | (cid:33) . (S15b)As in Refs. 35 and 37, column vector | σ (cid:105) is the charge sequence—its i th element, σ i , beingthe charge of the i th monomer (residue) of the IDR in units of the electronic charge e , and (cid:104) σ | ≡ | σ (cid:105) T ; T ∗ ≡ π(cid:15) k B T b/e is a reduced temperature. As noted above, (cid:15) is vacuumpermittivity, k B is Boltzmann constant, and T is absolute temperature. Previously, expressions such as above Eqs. (S14) and (S15) for (cid:15) r ( φ m ) were obtained heuristically byreplacing every instance of (cid:15) r in the corresponding constant- (cid:15) r expressions by (cid:15) r ( φ m ).5 C ONCENTRATION -D EPENDENT P ERMITTIVITY IN THE
RPA C
ONTEXT
We now examine whether—and if so what—restrictive conditions have to be satisfied forthe heuristic prescription Eqs. (S14) and (S15) to be valid.When (cid:15) r is position-independent, the electrostatic interaction energy (potential), in unitsof k B T , between two unit point charges e at positions r and r (cid:48) is given by U ( r , r (cid:48) ) = U ( r − r (cid:48) ) = e / (4 π(cid:15) (cid:15) r k B T | r − r (cid:48) | ). However, when (cid:15) r is position-dependent, i.e., (cid:15) r = (cid:15) r ( r ),in general the electrostatic potential U is not expressible in a simple close form because itis the solution to the generalized Poisson equation − ∇ r · [ (cid:15) r ( r ) ∇ r U ( r − r (cid:48) )] = 4 πl B δ ( r − r (cid:48) ) , (S16)as noted by Wang, where l B = e / (4 π(cid:15) k B T ) is vacuum Bjerrum length (unlike Ref. 35,here l B does not include (cid:15) r ). Thus, position dependence of (cid:15) r can entail rather complexmodifications of the charge-charge interactions. It cannot be analytically treated, in general,by simply replacing the constant (cid:15) r in U ( r , r (cid:48) ) = e / (4 π(cid:15) (cid:15) r k B T | r − r (cid:48) | ) by (cid:15) r ( r ) or (cid:15) r ( r (cid:48) ).Another concern is that, by construction, RPA theory accounts only for the lowest-orderpolymer density fluctuations beyond the mean-field homogeneous density. In contrast, someof the proposed IDR-concentration-dependent form of (cid:15) r = (cid:15) r ( φ m ), such as the “slab” andClasusius-Mossotti models and the effective medium approximations of Maxwell Garnettand of Bruggeman considered in Refs. 35, 37 involve higher-order dependence on φ m ,raising questions as to whether application of these (cid:15) r ( φ m ) formula in the context of RPAis consistent with the basic premises of RPA. We address these issues below. D ERIVATION OF
RPA
WITH C ONCENTRATION -D EPENDENT P ERMITTIVITY
Unless specified otherwise, the notation in this subsection follows that of Ref. 41, as thefollowing formal development is, one one hand, a restricted case of the theory in Ref. 41in that here we do not consider salt, counterions or Kuhn-length renormalization. On theother hand, the present analysis is an extension of the theory in Ref. 41, which is limitedto constant (cid:15) r ’s, to case with a position-dependent (cid:15) r ( r ). Accordingly, we note that thenumber of chains in the system, which is symbolized by n in the main text and elsewhere inthis Supporting Information, is denoted by n p (following Ref. 41) in the derivation below.In general, the Boltzmann factor for the electrostatic interaction energy of a systemwith charge density ρ ( r ) is given by exp[ − (1 / (cid:82) d r d r (cid:48) ρ ( r ) U ( r , r (cid:48) ) ρ ( r (cid:48) )]. (Note that theelectric charge density ρ ( r ) here and in subsequent development in this section shouldnot be confused with the matter density operator ˆ ρ or its matrix elements.) We focusfirst on obtaining an equivalent mathematical form of this factor that is amenable to RPAanalyses. By standard field-theoretic Hubbard-Stratonovich transformation, this factor may6be expressed as a functional integral over a conjugate field ψ ( r ):1(det ˆ U ) / (cid:26)(cid:89) r (cid:90) dψ ( r ) √ π (cid:27) exp (cid:20) − (cid:90) d r (cid:48) d r (cid:48)(cid:48) ψ ( r (cid:48) ) U − ( r (cid:48) , r (cid:48)(cid:48) ) ψ ( r (cid:48)(cid:48) ) − i (cid:90) d r (cid:48) ρ ( r (cid:48) ) ψ ( r (cid:48) ) (cid:21) , (S17)where ˆ U denote, in matrix notation, the operator U ( r , r (cid:48) ) [i.e., the matrix elementˆ U r , r (cid:48) = U ( r , r (cid:48) )], U − ( r (cid:48) , r (cid:48)(cid:48) ) is the r (cid:48) , r (cid:48)(cid:48) matrix element of the inverse operator ˆ U − of ˆ U . By definition, (cid:82) d r (cid:48)(cid:48) U − ( r , r (cid:48)(cid:48) ) U ( r (cid:48)(cid:48) , r (cid:48) ) = δ ( r − r (cid:48) ). Consider now the operator −∇ r (cid:48)(cid:48) · [ (cid:15) r ( r (cid:48)(cid:48) ) ∇ r (cid:48)(cid:48) δ ( r − r (cid:48)(cid:48) )] / (4 πl B ). Since (cid:90) d r (cid:48)(cid:48) {∇ r (cid:48)(cid:48) · [ (cid:15) r ( r (cid:48)(cid:48) ) ∇ r (cid:48)(cid:48) δ ( r − r (cid:48)(cid:48) )] }U ( r (cid:48)(cid:48) , r (cid:48) ) = (cid:90) d r (cid:48)(cid:48) [ (cid:15) r ( r (cid:48)(cid:48) ) ∇ r (cid:48)(cid:48) δ ( r − r (cid:48)(cid:48) )] · ∇ r (cid:48)(cid:48) U ( r (cid:48)(cid:48) , r (cid:48) )= (cid:90) d r (cid:48)(cid:48) δ ( r − r (cid:48)(cid:48) ) {∇ r (cid:48)(cid:48) · [ (cid:15) r ( r (cid:48)(cid:48) ) ∇ r (cid:48)(cid:48) U ( r (cid:48)(cid:48) , r (cid:48) )] } (S18)follows from repeated applications of integration by parts under the reasonable assumptionthat the values of the integrand cancel or vanish at the pertinent boundaries of integration,and by Eq. (S16) the quantity in curly brackets in the last term in Eq. (S18) is − πl B δ ( r (cid:48)(cid:48) − r (cid:48) ), Eq. (S18) is evaluated as − πl B (cid:82) d r (cid:48)(cid:48) δ ( r − r (cid:48)(cid:48) ) δ ( r (cid:48)(cid:48) − r (cid:48) ) = − πl B δ ( r − r (cid:48) ) and therefore −∇ r (cid:48)(cid:48) · [ (cid:15) r ( r (cid:48)(cid:48) ) ∇ r (cid:48)(cid:48) δ ( r − r (cid:48)(cid:48) )] / (4 πl B ) is the r , r (cid:48)(cid:48) matrix element of the inverse of ˆ U , viz., U − ( r , r (cid:48)(cid:48) ) = − πl B ∇ r (cid:48)(cid:48) · [ (cid:15) r ( r (cid:48)(cid:48) ) ∇ r (cid:48)(cid:48) δ ( r − r (cid:48)(cid:48) )] . (S19)Equivalently, the r (cid:48)(cid:48) , r matrix element of ˆ U − takes the form U − ( r (cid:48)(cid:48) , r ) = − πl B ∇ r · [ (cid:15) r ( r ) ∇ r δ ( r (cid:48)(cid:48) − r )] . (S20)It follows that the − (1 / (cid:82) d r (cid:48) d r (cid:48)(cid:48) ψ ( r (cid:48) ) U − ( r (cid:48) , r (cid:48)(cid:48) ) ψ ( r (cid:48)(cid:48) ) factor in Eq. (S17) is given by − (cid:90) d r (cid:48) d r (cid:48)(cid:48) ψ ( r (cid:48) ) U − ( r (cid:48) , r (cid:48)(cid:48) ) ψ ( r (cid:48)(cid:48) ) = 18 πl B (cid:90) d r d r (cid:48) ψ ( r ) {∇ r (cid:48) · [ (cid:15) r ( r (cid:48) ) ∇ r (cid:48) δ ( r − r (cid:48) )] } ψ ( r (cid:48) )= − πl B (cid:90) d r d r (cid:48) ψ ( r )[ (cid:15) r ( r (cid:48) ) ∇ r (cid:48) δ ( r − r (cid:48) )] · [ ∇ r (cid:48) ψ ( r (cid:48) )]= 18 πl B (cid:90) d r d r (cid:48) ψ ( r )[ (cid:15) r ( r (cid:48) ) ∇ r δ ( r − r (cid:48) )] · [ ∇ r (cid:48) ψ ( r (cid:48) )]= − πl B (cid:90) d r d r (cid:48) (cid:15) r ( r (cid:48) ) δ ( r − r (cid:48) )[ ∇ r ψ ( r )] · [ ∇ r (cid:48) ψ ( r (cid:48) )]= − πl B (cid:90) d r (cid:15) r ( r )[ ∇ r ψ ( r )] · [ ∇ r ψ ( r )]= − πl B (cid:90) d r (cid:15) r ( r ) |∇ ψ ( r ) | , (S21)7where the first equality follows from a mere change in the integration variable, the sec-ond and fourth equalities from integration by parts assuming that boundary contributionvanishes, the third equality from ∇ r (cid:48) δ ( r − r (cid:48) ) = −∇ r δ ( r − r (cid:48) ), and the r subscript of ∇ r is dropped in the final expression because there is little danger of notational ambiguity.Equation (S21) is identical to the corresponding terms in the Hamiltonians in Eq. (3) ofRef. 124 and Eq. (2.7) of Ref. 105 for systems with an inhomogeneous dielectric medium.We turn next to the (det ˆ U ) − / factor in Eq. (S17). For any matrices A and B ,(det A ) − = (det A − ) and (det AB ) = (det A )(det B ), we write (det ˆ U ) − / = (det ˆ U − ) / = (det ˆ (cid:15) r ) / (det ˆ U − ) / , where ˆ U − ’s matrix elements U − rr (cid:48) ≡ U − ( r , r (cid:48) ) is given byEq. (S19), the r , r (cid:48) matrix elements of the operators ˆ (cid:15) r and ˆ U − are defined, respectively, by( ˆ (cid:15) r ) rr (cid:48) ≡ (cid:15) r ( r ) δ ( r − r (cid:48) ) , (S22)( ˆ U − ) rr (cid:48) ≡ − πl B ∇ r δ ( r − r (cid:48) ) . (S23)Then, ˆ (cid:15) r ˆ U − = ˆ U − can be ready verified using integration by parts:( ˆ (cid:15) r ˆ U − ) rr (cid:48) = (cid:90) d r (cid:48)(cid:48) ( ˆ (cid:15) r ) rr (cid:48)(cid:48) ( ˆ U − ) r (cid:48)(cid:48) r (cid:48) = − πl B (cid:90) d r (cid:48)(cid:48) (cid:15) r ( r ) δ ( r − r (cid:48)(cid:48) ) ∇ r (cid:48)(cid:48) δ ( r (cid:48)(cid:48) − r (cid:48) )= 14 πl B (cid:90) d r (cid:48)(cid:48) (cid:15) r ( r (cid:48)(cid:48) )[ ∇ r (cid:48)(cid:48) δ ( r − r (cid:48)(cid:48) )] · [ ∇ r (cid:48)(cid:48) δ ( r (cid:48)(cid:48) − r (cid:48) )]= − πl B (cid:90) d r (cid:48)(cid:48) ∇ r (cid:48)(cid:48) · { (cid:15) r ( r (cid:48)(cid:48) )[ ∇ r (cid:48)(cid:48) δ ( r − r (cid:48)(cid:48) )] } δ ( r (cid:48)(cid:48) − r (cid:48) )= − πl B ∇ r (cid:48) · [ (cid:15) r ( r (cid:48) ) ∇ r (cid:48) δ ( r − r (cid:48) )]= ˆ U − rr (cid:48) , Q . E . D . (S24)Because ˆ (cid:15) r in Eq. (S22) is a diagonal matrix,(det ˆ (cid:15) r ) = (cid:89) r (cid:15) r ( r ) . (S25)Using Fourier transformation from r to k space, (det ˆ U − ) = (cid:89) k (cid:54) = k πl B , (S26)where k ≡ | k | . Note that the k = term is excluded in the above and subsequentconsiderations because it does not contribute to the exponential factor in Eq. (S17) for ourelectrically neutral system of overall neutral polyampholytes.8The free energy per unit volume l in units of k B T of our system is given by f = φ m N ln φ m + (1 − φ m ) ln(1 − φ m ) − l Ω ln Z el , (S27)where N is the chain length (number of monomers) of the polyampholyte, Ω is solution(system) volume, φ m ≡ l n p N/ Ω is monomer volume fraction with n p being the totalnumber of identical polyampholyte chains in the solution [ n p corresponds to the varible n used above in the formulation for explicit-chain simulations; it should also be notedhere that the alternately defined φ m = a n p N/ Ω in Eq. (3) of Ref. 35—which applies toEqs. (S14) and (S15) in the present work—is equal to polyampholyte volume fraction onlywhen the size of a monomer ≈ l is equal to the model volume unit a of the model, i.e.,when r m = 1; whereas polyampholyte volume fraction is given by r m φ m in general ; forsimplicity, r m = 1 is assumed below unless specified otherwise], and Z el is the electrostaticpartition function, which may be viewed as a special case of Z (cid:48) in Eq. (A9) of Ref. 41 withno salt, no counterion, and v = 0, but now extended to (cid:15) r = (cid:15) r ( r ). Z el is given by integralsover monomer coordinates, Z el = (cid:90) n p (cid:89) α =1 N (cid:89) τ =1 d R α,τ e − H [ R ] , (S28)where R α,τ denotes the coordinate of the τ th monomer in the α th polyampholyte [ R α,τ corresponds to the position variable R µi defined before Eq. (S1) in the formulation forexplicit-chain simulations; the monomer label τ corresponds also to the label i in Eq. (S15b)],and H [ R ] = 32 l n p (cid:88) α =1 N − (cid:88) τ =1 ( R α,τ +1 − R α,τ ) + 12 n p (cid:88) α,β =1 N (cid:88) τ,µ =1 V τµαβ ( R α,τ , R β,µ ) . (S29)The first term of H [ R ] is for Gaussian-chain connectivity of the polyampholyte chains and V τµαβ in the second term is the interaction potential energy between the τ th monomer in the α th chain and the µ th monomer in the β th chain, viz., V τµαβ ( r , r (cid:48) ) = l B σ τ σ µ U ( r , r (cid:48) ) , (S30)where σ τ , σ µ are the charges, respectively, of monomers τ , µ along each of the n p polyam-pholyte chains. We may now rewrite Eq. (S28) as a functional integral over the chargedensity ρ ( r ) by including in the integrand a δ -functional for ρ ( r ): Z el = (cid:90) (cid:89) r dρ ( r ) (cid:90) n p (cid:89) α =1 N (cid:89) τ =1 d R α,τ e − H [ ρ, R ] δ [ ρ ( r ) − n p (cid:88) α =1 N (cid:88) τ =1 σ τ δ ( r − R α,τ )] , (S31)9which follows from ρ ( r ) = (cid:80) n p α =1 (cid:80) Nτ =1 σ τ δ ( r − R α,τ ), whereas H [ ρ, R ] is defined as H [ ρ, R ] = 32 l n p (cid:88) α =1 N − (cid:88) τ =1 ( R α,τ +1 − R α,τ ) + 12 (cid:90) d r d r (cid:48) ρ ( r ) U ( r , r (cid:48) ) ρ ( r (cid:48) ) . (S32)Now, by applying Eqs. (S17) and (S21), the partition function Z el in Eq. (S31) may beexpressed as a functional integral over ρ ( r ), R α,τ , and the conjugate fields ψ ( r ): Z el = (cid:90) (cid:89) r dρ ( r ) (cid:90) n p (cid:89) α =1 N (cid:89) τ =1 d R α,τ exp (cid:20) − l n p (cid:88) α =1 N − (cid:88) τ =1 ( R α,τ +1 − R α,τ ) (cid:21) × U ) / (cid:26)(cid:89) r (cid:90) dψ ( r ) √ π (cid:27) exp (cid:20) − πl B (cid:90) d r (cid:15) r ( r ) |∇ ψ ( r ) | − i (cid:90) d r (cid:48) ρ ( r (cid:48) ) ψ ( r (cid:48) ) (cid:21) × δ [ ρ ( r ) − n p (cid:88) α =1 N (cid:88) τ =1 σ τ δ ( r − R α,τ )] . (S33)After performing the (cid:81) r dρ ( r ) functional integrals in the above expression, Z el becomes Z el = (cid:90) n p (cid:89) α =1 N (cid:89) τ =1 d R α,τ U ) / (cid:26)(cid:89) r (cid:90) dψ ( r ) √ π (cid:27) e − H [ ψ, R ] , (S34)where H [ ψ, R ] = 32 l n p (cid:88) α =1 N − (cid:88) τ =1 ( R α,τ +1 − R α,τ ) + 18 πl B (cid:90) d r (cid:15) r ( r ) |∇ ψ ( r ) | + i n p (cid:88) α =1 N (cid:88) τ =1 σ τ ψ ( R α,τ ) . (S35)We now proceed to evaluate the (det ˆ U ) − / factor in Eq. (S34) via the aforementionedrelations (det ˆ U ) − / = (det ˆ U − ) / and ˆ U − = ˆ (cid:15) r ˆ U − . Using Eq. (S25) and applying thecorrespondence (cid:88) r → N r Ω (cid:90) d r (S36)where N r is formally the number of r positions in the system, we may write (cid:112) det ˆ (cid:15) r = (cid:89) r (cid:112) (cid:15) r ( r ) = exp (cid:26) (cid:88) r ln[ (cid:15) r ( r )] (cid:27) = exp (cid:26) N r (cid:90) d r ln[ (cid:15) r ( r )] (cid:27) . (S37)For reasons to be enunciated below, consider the case in which (cid:15) r ( r ) is a linear combinationof polyampholyte and water relative permittivites, i.e., (cid:15) r ( r ) = (cid:15) p φ m ( r ) + (cid:15) w [1 − φ m ( r )] = (cid:15) w + (cid:15) (cid:48) φ m ( r ) , (S38)0where (cid:15) p and (cid:15) w are, respectively, the relative permittivities of polymer and water, and (cid:15) (cid:48) = (cid:15) p − (cid:15) w . Since the position-dependent monomer density φ m ( r ) = l n p (cid:88) α =1 N (cid:88) τ =1 δ ( r − R α,τ ) , (S39)ln[ (cid:15) r ( r )] = ln (cid:15) w + ln (cid:20) (cid:15) (cid:48) (cid:15) w φ m ( r ) (cid:21) = ln (cid:15) w + ln (cid:34) (cid:15) (cid:48) l (cid:15) w n p (cid:88) α =1 N (cid:88) τ =1 δ ( r − R α,τ ) (cid:35) . (S40)To be consistent with RPA which accounts only for lowest-order polymer density fluctua-tions, we approximate the above expression for ln[ (cid:15) r ( r )] by including terms only up to theone linear in φ m , viz., ln[ (cid:15) r ( r )] ≈ ln (cid:15) w + (cid:15) (cid:48) l (cid:15) w n p (cid:88) α =1 N (cid:88) τ =1 δ ( r − R α,τ ) . (S41)Hence the argument of the exponential function in Eq. (S37) is given by N r (cid:90) d r ln[ (cid:15) r ( r )] ≈ N r (cid:15) w + N r (cid:15) (cid:48) l n p N (cid:15) w Ω = N r (cid:15) w + N r (cid:15) (cid:48) (cid:15) w φ m ≈ N r (cid:15) w + N r (cid:18) (cid:15) (cid:48) (cid:15) w φ m (cid:19) = N r (cid:15) r ( φ m )] , (S42)where the position-independent φ m ≡ l (cid:82) d r (cid:80) n p α =1 (cid:80) Nτ =1 δ ( r − R α,τ ) = l n p N/ Ω is theoverall average monomer volume fraction, the second approximate relation is in line withthat in Eq. (S41), and the last equality follows from definition Eq. (S38). In formulationsinvolving a size-dependent mean-field lattice model with φ m defined in terms of unit volume a (cid:54) = l (Ref. 35), the actual average monomer volume fraction φ is given by φ = r m φ m where r m is the monomer size factor, in which case (cid:15) r ( φ m ) is understood to represent the (cid:15) r expression in which all φ m is replaced by φ = r m φ m ; i.e., (cid:15) r ( φ m ) → (cid:15) r ( φ m → φ = r m φ m ).With Eq. (S42), further application of Eqs. (S26) and (S37) yields(det ˆ U ) − / = (cid:112) det ˆ (cid:15) r (cid:113) det ˆ U − ≈ (cid:104)(cid:112) (cid:15) r ( φ m ) (cid:105) N r (cid:89) k (cid:54) = (cid:115) k πl B = (cid:89) k (cid:54) = (cid:115) k [ (cid:15) r ( φ m )] N r / ( N r − πl B ≈ (cid:89) k (cid:54) = (cid:115) k (cid:15) r ( φ m )4 πl B (S43)1for the (det ˆ U ) − / factor in Eq. (S34). To arrive at this expression, we made use of the factthat the total number of reciprocal space positions k is N r (same as the total number ofcoordinate space positions r when k = is included in the count), and that N r (cid:29)
1. Itfollows that Z el in Eq. (S34) may be written as Z el = (cid:89) k (cid:54) = (cid:115) k (cid:15) r ( φ m )4 πl B (cid:90) n p (cid:89) α =1 N (cid:89) τ =1 d R α,τ (cid:26)(cid:89) r (cid:90) dψ ( r ) √ π (cid:27) e − H [ ψ, R ] , (S44)where H [ ψ, R ] is given by Eq. (S35) with (cid:15) r ( r ) given by Eq. (S38): H [ ψ, R ] = (cid:15) w πl B (cid:90) d r [ ∇ ψ ( r )] + (cid:15) (cid:48) πl B (cid:90) d r φ m ( r ) [ ∇ ψ ( r )] + 32 l n p (cid:88) α =1 N − (cid:88) τ =1 ( R α,τ +1 − R α,τ ) + i n p (cid:88) α =1 N (cid:88) τ =1 σ τ ψ ( R α,τ )= (cid:15) w πl B (cid:90) d r [ ∇ ψ ( r )] + n p (cid:88) α =1 (cid:40) l N − (cid:88) τ =1 ( R α,τ +1 − R α,τ ) + N (cid:88) τ =1 (cid:20) iσ τ ψ ( R α,τ ) + (cid:15) (cid:48) l πl B [ ∇ ψ ( R α,τ )] (cid:21) (cid:41) , (S45)where Eq. (S39) for φ m ( r ) have been applied to yield the last equality. Utilizing the Fouriertransformation ψ k = (Ω / N r ) (cid:80) r ψ ( r ) exp( − i k · r ) of the conjugate field ψ ( r ) [which maythen be expressed as the inverse transformation of ψ k , i.e., ψ ( r ) = (1 / Ω) (cid:80) k ψ k exp( i k · r )]and the (cid:80) r ↔ ( N r / Ω) (cid:82) d r correspondence in Eq. (S36), the first term in the aboveEq. (S45) can be rewritten as (cid:15) w πl B (cid:90) d r [ ∇ ψ ( r )] → (cid:15) w πl B (cid:18) Ω N r (cid:19) (cid:88) r (cid:34)(cid:32) (cid:88) k ψ k ∇ e − i k · r (cid:33) · (cid:32) (cid:88) k (cid:48) ψ k (cid:48) ∇ e − i k (cid:48) · r (cid:33)(cid:35) = − (cid:15) w Ω8 πl B (cid:88) k (cid:88) k (cid:48) ψ k ( k · k (cid:48) ) ψ k (cid:48) δ k + k (cid:48) = 12Ω (cid:88) k (cid:15) w k πl B ψ k ψ − k = 12Ω (cid:88) k (cid:54) = (cid:15) w k πl B ψ k ψ − k , (S46)where the last equality follows because the k = term vanishes by virtue of the k fac-tor. The remaining terms of H [ ψ, R ] in Eq. (S45) can be rewritten as the summation ofcontributions from n p independent polymers, as follows. Consider the partition function Q p [ ψ ] = (cid:90) D [ R ] e −H p [ ψ, R ] (S47)2for a single polymer, where D [ R ] = (cid:81) Nτ =1 d R τ , and H p [ ψ, R ] ≡ l N − (cid:88) τ =1 ( R τ +1 − R τ ) + N (cid:88) τ =1 (cid:34) i Ω (cid:88) k σ τ ψ k e − i k · R τ − (cid:15) (cid:48) l πl B (cid:88) k (cid:88) k (cid:48) ( k · k (cid:48) ) ψ k ψ k (cid:48) e − i ( k + k (cid:48) ) · R τ (cid:35) = 32 l N − (cid:88) τ =1 ( R τ +1 − R τ ) + N (cid:88) τ =1 (cid:34) i Ω (cid:88) k (cid:54) = σ τ ψ k e − i k · R τ − (cid:15) (cid:48) l πl B (cid:88) k , k (cid:48) (cid:54) = ( k · k (cid:48) ) ψ k ψ k (cid:48) e − i ( k + k (cid:48) ) · R τ (cid:35) . (S48)Note that the label α in R α,τ is dropped in (cid:81) Nτ =1 d R τ and Eq. (S48) because the pertinentintegration is only over the monomer coordinates of a single polymer chain. The k , k (cid:48) = terms can be excluded in the summations of the last line of Eq. (S48) because in the firstsummation (cid:80) Nτ =1 σ τ = 0 for the overall neutral polyampholytes considered here and the( k · k (cid:48) ) factor in the second summation means that the k , k (cid:48) = terms are identically zero.Utilizing the definition of ψ ( r ) to ψ k Fourier transformation stated after Eq. (S45), it canreadily be verified that H p [ ψ, R ] is precisely the k -space version of the quantity enclosed incurly brackets on the right hand side of Eq. (S45). Upon changing the functional integrationvariables ψ ( r ) in Eq. (S44) to ψ k and including the k -independent functional Jacobian | δψ ( r ) /δψ k | (which have no effect on the configurational distribution of the system), (cid:26)(cid:89) r (cid:90) dψ ( r ) √ π (cid:27) → (cid:40)(cid:89) k (cid:90) (cid:114) N r π Ω dψ k (cid:41) (S49)formally, and thus Eq. (S44) can now be recast in the equivalent form Z el = (cid:89) k (cid:54) = (cid:115) k (cid:15) r ( φ m )4 πl B (cid:26)(cid:89) k (cid:90) (cid:114) N r π Ω dψ k (cid:27) exp (cid:34) − (cid:88) k (cid:54) = (cid:15) w k πl B ψ k ψ − k (cid:35) × (cid:90) n p (cid:89) α =1 (cid:40) N (cid:89) τ =1 d R τ exp ( −H p [ ψ, R ]) (cid:41) , (S50)where we have made use of the fact that in the above expression, the first exponentialfactor [from Eq. (S46)] is independent of R α,τ , and the quantity enclosed in the last set ofcurly brakets [from Eq. (S48)] is identical for all n p values of α , thus the entire last line ofEq. (S50) is equal to n p ln Q p [ ψ ] in accordance with Eq. (S47). Because, as argued above,there is no k = contribution to H p [ ψ, R ], the (cid:81) k ( N r / π Ω ) / (cid:82) dψ k functional integral inEq. (S50) may be restricted to (cid:81) k (cid:54) = ( N r / π Ω ) / (cid:82) dψ k with no impact on configurational3distribution. Therefore, Z el takes the simplified form: Z el = (cid:89) k (cid:54) = (cid:90) (cid:114) N r π Ω dψ k (cid:115) (cid:15) r ( φ m ) k πl B e − H [ ψ k ] , (S51)where H [ ψ k ] = 12Ω (cid:88) k (cid:54) = (cid:15) w k πl B ψ k ψ − k − n p ln Q p [ ψ ] . (S52)We are now in a position to apply RPA by expanding ln Q p around ψ k = 0 up to secondorder in ψ k , namelyln Q p [ ψ ] ≈ ln Q p [ ψ = 0] + (cid:88) k (cid:18) δ ln Q p δψ k (cid:19) ψ =0 ψ k + 12 (cid:88) k , k (cid:48) (cid:18) δ ln Q p δψ k δψ k (cid:48) (cid:19) ψ =0 ψ k ψ k (cid:48) , (S53)wherein the zeroth order term (first term on the right hand side) is a constant that playsno role in determining configurational distribution. The first order term (cid:18) δ ln Q p δψ k (cid:19) ψ =0 = 1 Q p [ ψ = 0] δ Q p δψ k (cid:12)(cid:12)(cid:12)(cid:12) ψ =0 = N (cid:88) τ =1 (cid:42) − i Ω σ τ e − i k · R τ + 2 × (cid:15) (cid:48) l πl B k · (cid:88) k (cid:48) (cid:54) = k (cid:48) ψ k (cid:48) e − i ( k + k (cid:48) ) · R τ (cid:43) ψ =0 = − i Ω N (cid:88) τ =1 σ τ (cid:10) e − i k · R τ (cid:11) ψ =0 =0 (S54)as well. Here, the average (cid:104) ... (cid:105) ψ =0 is over monomer coordinates [ R ] and evaluated at ψ k = 0,the third equality follows because the second term in the second line of the above equationcontains a factor of ψ that is set to zero, and the last equality is a consequence of the overallneutrality of the polyampholytes in the system we considered ( (cid:80) Nτ =1 σ τ = 0). The secondorder term in the above Eq. (S53) is given by (cid:18) δ ln Q p δψ k δψ k (cid:48) (cid:19) ψ =0 = 1 Q p [ ψ = 0] δ Q p δψ k δψ (cid:48) k (cid:12)(cid:12)(cid:12)(cid:12) ψ =0 − Q p [ ψ = 0] δ Q p δψ k (cid:12)(cid:12)(cid:12)(cid:12) ψ =0 × Q p [ ψ = 0] δ Q p δψ (cid:48) k (cid:12)(cid:12)(cid:12)(cid:12) ψ =0 = 1 Q p [ ψ = 0] δ Q p δψ k δψ (cid:48) k (cid:12)(cid:12)(cid:12)(cid:12) ψ =0 = 1Ω (cid:15) (cid:48) l πl B k · k (cid:48) N (cid:88) τ =1 (cid:68) e − i ( k + k (cid:48) ) · R τ (cid:69) ψ =0 − N (cid:88) τ,µ =1 σ τ σ µ (cid:68) e − i ( k · R τ + k (cid:48) R µ ) (cid:69) ψ =0 , (S55)4where the second equality follows from Eq. (S54). The two R -averages over Gaussian chainconfigurations in the above Eq. (S55) may be evaluated as follows. For (cid:10) e − i ( k + k (cid:48) ) · R τ (cid:11) ψ =0 ,only a single monomer coordinate variable R τ is involved and thus it is uncontrained and R -averaging entails only a single integration of R τ over the entire system volume Ω. Thecorrespondence (cid:82) d R τ ↔ (Ω / N r ) (cid:80) R τ yields (cid:10) e − i ( k + k (cid:48) ) · R τ (cid:11) ψ =0 = δ k , − k (cid:48) . Next, to compute (cid:10) e − i ( k · R τ + k (cid:48) R µ ) (cid:11) ψ =0 , we rewrite it as (cid:10) e − i k · ( R τ − R µ ) e − i ( k + k (cid:48) ) · R µ ) (cid:11) ψ =0 , which indicates that the R -averaging involves integrating over two monomer coordinates, one is unconstrained andthe other is constrained by the Gaussian chain statistics for two points separated by acontour length l | τ − µ | . Without loss of generality, we select R µ to be the unconstrainedcoordinates. As for the first average, summing over R µ using the (cid:82) d R µ ↔ (Ω / N r ) (cid:80) R µ correspondence yields the Kronecker δ k , − k (cid:48) . In accordance with the Gaussian statisticsgoverned by the 3 / l (cid:80) N − τ =1 ( R τ +1 − R τ ) term of H p [ ψ, R ] in Eq. (S48), R τ − R µ is weightedby exp( − | R τ − R µ | / l | τ − µ | ), and therefore the R -averaging of e − i k · ( R τ − R µ ) yieldsexp( − k l | τ − µ | / (cid:18) δ ln Q p δψ k δψ k (cid:48) (cid:19) ψ =0 = − δ k , − k (cid:48) Ω (cid:20) (cid:15) (cid:48) N l k πl B + (cid:104) σ | ˆ G M ( kl ) | σ (cid:105) (cid:21) (S56)for Eq. (S55), where [ ˆ G M ( kl )] τµ = exp[ − ( kl ) | τ − µ | /
6] as defined above. Therefore, accord-ing to Eqs. (S53) and (S54), the n p ln Q p [ ψ ] term in Eq. (S52) is given by n p ln Q p [ ψ ] ≈ n p (cid:88) k , k (cid:48) (cid:18) δ ln Q p δψ k δψ k (cid:48) (cid:19) ψ =0 ψ k ψ k (cid:48) = − n p (cid:88) k , k (cid:48) δ k , − k (cid:48) (cid:20) (cid:15) (cid:48) N l k πl B + (cid:104) σ | ˆ G M ( kl ) | σ (cid:105) (cid:21) ψ k ψ k (cid:48) = − (cid:88) k (cid:54) =0 (cid:20) (cid:15) (cid:48) φ m k πl B + φ m N l (cid:104) σ | ˆ G M ( kl ) | σ (cid:105) (cid:21) ψ k ψ − k , (S57)where we have used the definition of polymer volume fraction φ m = l n p N/ Ω, and the factthat the k = terms vanishes: the first term because of the k factor and the second termbecause of the overall neutrality of the polyampholytes, i.e., (cid:80) τ σ τ = 0, and [ ˆ G M (0)] τµ = 1.Combining this result with Eq. (S52), we arrive at H [ ψ k ] ≈ (cid:88) k (cid:54) = (cid:20) ( (cid:15) w + (cid:15) (cid:48) φ m ) k πl B + φ m N l (cid:104) σ | ˆ G M ( kl ) | σ (cid:105) (cid:21) ψ k ψ − k = 12Ω (cid:88) k (cid:54) = (cid:20) (cid:15) r ( φ m ) k πl B + φ m N l (cid:104) σ | ˆ G M ( kl ) | σ (cid:105) (cid:21) ψ k ψ − k , (S58)where we have made use of the above definition of (cid:15) r ( φ m ) which is linear in φ m . We may nowevaluate Z el by performing the functional integral (cid:81) k (cid:54) = (cid:82) dψ k in Eq. (S51). Because the5 ψ k ’s are Fourier transformations of the real-valued field ψ ( r ), ψ ∗ k = ψ − k and (cid:81) k (cid:54) = (cid:82) dψ k = (cid:81) k > (cid:82) dψ k (cid:82) dψ ∗ k , where the k > notation means that the product or summation excludesthe origin and is over k = ( k , k , k ) but not − k = ( − k , − k , − k ). This can be effectuatedby first excluding ( k , k , k ) = (0 , ,
0) and then restricting the product or sum to k ≥ k ≥ k ≥ ψ k in terms of its real part ψ R k and imaginary part ψ I k , i.e., ψ k = ψ R k + iψ I k and ψ ∗ k = ψ R k − iψ I k where ψ R k and ψ I k are real numbers, one obtains (cid:81) k > (cid:82) dψ k (cid:82) dψ ∗ k = (cid:81) k > (cid:82) ∞−∞ dψ R k (cid:82) ∞−∞ dψ I k . Since ψ k ψ − k = ( ψ R k ) + ( ψ I k ) , Z el = (cid:40)(cid:89) k > (cid:18) N r π Ω (cid:19) (cid:20) (cid:15) r ( φ m ) k πl B (cid:21) (cid:90) ∞−∞ dψ R k (cid:90) ∞−∞ dψ I k (cid:41) × exp (cid:40) (cid:88) k > (cid:20) (cid:15) r ( φ m ) k πl B + φ m N l (cid:104) σ | ˆ G M ( kl ) | σ (cid:105) (cid:21) (cid:20) ( ψ R k ) + ( ψ I k ) (cid:21)(cid:41) = (cid:89) k > (cid:18) N r π Ω (cid:19) (cid:20) (cid:15) r ( φ m ) k πl B (cid:21) × π Ω (cid:20) (cid:15) r ( φ m ) k πl B + φ m N l (cid:104) σ | ˆ G M ( kl ) | σ (cid:105) (cid:21) − = (cid:89) k (cid:54) = (cid:114) N r Ω (cid:20) πl B (cid:15) r ( φ m ) k φ m N l (cid:104) σ | ˆ G M ( kl ) | σ (cid:105) (cid:21) − / . (S59)Hence, up to an additive constant ∝ N r ln( N r / Ω) that does not affect configurationaldistribution, the electrostatic contribution to the free energy in Eq. (S27) is equal to f el ≡ − l Ω ln Z el = − l Ω (cid:88) k (cid:54) = ln (cid:20) πl B (cid:15) r ( φ m ) k φ m N l (cid:104) σ | ˆ G M ( kl ) | σ (cid:105) (cid:21) − / → l (cid:90) d k (2 π ) ln (cid:20) πl B (cid:15) r ( φ m ) k φ m N l (cid:104) σ | ˆ G M ( kl ) | σ (cid:105) (cid:21) , (S60)where we have applied the correspondence1Ω (cid:88) k → (cid:90) d k (2 π ) (S61)and noted that the k → d k ∝ k dk and thus (cid:80) k (cid:54) = may be approximated by Ω (cid:82) d k/ (2 π ) for this quantity. Thelast expression in Eq. (S60) is formally identical to the one we obtained previously byheuristically replacing the position- and φ m -independent (cid:15) r in simple RPA theory with (cid:15) r ( φ m ) [Eq. (S14)]. This can be readily verified by setting b = a = l , hence η = 1 inEqs. (S14) and (S15), and noting that (1 / d k/ (2 π ) = k dk/ π , in which case the lastline of Eq. (S60) is seen to be equal to Eq. (S14) with the G (˜ k ) term [Eq. (S15a)] present butthe G (˜ k ) term [Eq. (S15b)] omitted (no subtraction of self interaction) as well as ˜ k (1 + ˜ k ) → ˜ k (no short-range cutoff for Coulomb interaction).6In other words, the heuristic RPA formulas for (cid:15) r → (cid:15) r ( φ m ) in Eqs. (S14) and (S15)can be rigorously established in the context of RPA approximation provided that (cid:15) r is alinear function of φ m . Indeed, if (cid:15) r was a more complicated function of φ m , the last termin Eq. (S45) would have individual interaction terms, such as δ ( R α,τ − R β,µ ), etc., thatinvolve different polymer chains, and that would necessitate an additional summation (cid:80) α over polymer chains instead of a single (cid:80) τ over monomers on a single chain. In that case,the subsequent simplification in terms of the single-chain partition function Q p [Eq. (S50)]and thus the RPA expansion of ln Q p [Eq. (S53)] cannot proceed in the manner describedabove. Therefore, it remains unclear whether Eq. (S60) holds in general for (cid:15) r ( φ m ) that isnot linear in φ m .In our previous applications, we considered a Coulomb potential with a physical short-range cutoff by the modification U ( r , r (cid:48) ) = l B (cid:15) r | r − r (cid:48) | → U ( r , r (cid:48) ) = l B (cid:15) r | r − r (cid:48) | (cid:18) − e −| r − r (cid:48) | /l (cid:19) (S62)[cf. Eq. (6) of Ref. 34; Eq. (34) of Ref. 35], which for constant, position-independent (cid:15) r results in a f el with 1 /k replaced by 1 / [ k (1 + k )]. In the context of a general position-dependent (cid:15) r , this feature can in principle be accounted for by introducing an (cid:15) r ( | r − r (cid:48) | ), butthe necessary formalism has not been developed. In the present work, we incorporate thisfeature by simply replacing the 1 /k factor by 1 / [ k (1+ k )] in Eq. (S60) so as to capture thisphysical property as much as possible and place our present results on an essentially equalfooting with our earlier results for position-independent (cid:15) r . Mathematically, this proceduremay be viewed as a regularization for “ultraviolet” large- k (i.e., small- | r − r (cid:48) | ) divergence.As such, it does serve to impart a physical short-spatial-range cutoff, though it may notcorrespond exactly to any particularly modified form of f el in Eq. (S62) that is applicableto a general position-dependent (cid:15) r ( r ).Taking all of the above into consideration, we use the general formula in Eqs. (S14) and(S15) above (which allows for a (cid:54) = b = l and thus η = ( b/a ) (cid:54) = 1 and r m (cid:54) = 1) for comparingRPA theory against explicit-chain simulation, with the understanding that (cid:15) r must be alinear function of polymer volume fraction φ = r m φ m . Following previous practice, the electrostatic self-interaction term G (˜ k ) = 4 πl B φ m / [ k (1 + k b ) (cid:15) r ( φ m ) N b ] (cid:80) Nτ =1 | σ τ | issubtracted in Eq. (S14). In the context of a position-dependent (cid:15) r ( r ), however, we recognizethat this term can be physically significant for capturing the polyampholyte chains’ varyingpreference for different dielectric environments. Hence we consider also an electrostaticfree energy f [self]el ≡ (cid:90) d ˜ k ˜ k π η ln (cid:104) η G (˜ k ) (cid:105) = a (cid:90) dkk π ln (cid:20) b a G ( kb ) (cid:21) (S63)that includes (does not substract) electrostatic self-interaction, and use both Eq. (S14) and7Eq. (S63) in our comparison of analytical theory with chain simulation. U NIT C ONVERSION FOR C OMPARISON WITH E XPLICIT -C HAIN S IMULATIONS
The theory-predicted phase diagrams (coexistence curves) in in Fig. S6 of the SupportingInformation for position- and IDR concentration-independent (cid:15) r are computed numericallyusing the RPA+FH model described in Ref. 35. Specifically, translational and mixingentropy is given by Eqs. (13) and (14), the RPA formula for f el is provided by Eqs. (39)and (40), and the augmented FH term is the one in Eq. (61) of this reference. Values of theparameters in these formulas are adapted to the present application, as follows: • a : Unit length of the model. We set the unit volume, a , to be that of the volumeoccupied by a water molecule in pure water, i.e., φ purew = ρ purew × a = 1, where thenumber density of pure water ρ purew = 10 g m − N A / . g m − isdensity of water, N A = 6 . × is Avogadro’s constant and 18 . a = (1 /ρ purew ) / = 3 .
104 ˚A= 3 . × − m. • b : The C α –C α virtual bond length of polypeptides b = l = 3 . . × − m. • η [in Eq. (39) of Ref. 35]: From the above values for a and b , η = ( b/a ) =(3 . / . = 1 . • r m (monomer size factor in Eq. (14) of Ref. 35): The r m ratio between the size ofone amino acid residue in Ddx4 IDR and the unit volume a is obtained as follows.Because the density of pure protein = 1 ,
587 mg ml − , number of amino acid residues(monomers) in Ddx4 IDR is N = 241, and the molar mass of Ddx4 IDR is 25 , ρ pure m = (1 . × ) g m − × × N A / ,
833 g . (S64)Since the volume fraction φ of pure protein is unity by definition, i.e., φ = ρ m × r m × a ,it follows that r m = ( a ρ pure m ) − = ρ purew /ρ pure m = 2583318 . · . · . . (S65) • r s and r c [size factors for salt and counterions, respectively, in Eq. (14) of Ref. 35]:Both r s and r c are set to 1.The conversion between the φ m = n p N a / Ω in analytical theory to Ddx4 concentration,[Ddx4], in units of mg/ml (mg ml − ), is given by φ m = (cid:110) [Ddx4(mg/ml)] × / mg × / (Ddx4 molar mass in g) (cid:111) × N A × a , (S66)8where N = 236 is the chain length of the Ddx4 IDRs, (Ddx4 molar mass in g) of the fourDdx4 IDR sequences are 25412.48, 25412.48, 24346.80, and 24740.48, respectively, for WT,CS, FtoA, and RtoK. It should be noted that there is a slight mismatch in the lengthsof Ddx4 IDRs (236 vs 241) because a Ddx4 N1 sequence with six amino acids added toits C-terminus as a tag was used in experiments. Nonetheless, N = 236 is adopted inEq. (S66) because the N = 236 sequence published in Ref. 4 is used in our simulations.In the context of our approximate analytical theory and coarse-grained chain model, thenumerical difference between using N = 236 and N = 241 is not expected to be insignificant.The mean-field Flory-Huggins (FH) χ parameters of non-electrostatic interactions forthe four Ddx4 IDR sequences are obtained from averaging the KH potential energies (cid:15) ij ( r )(= E ij ( r ) [KH] in Fig. 1a of main text) for a given sequence (seq) over all i, j pairs ofsequence positions except those entailing a charge-charge interaction [i.e., RR (Arg-Arg),RK (Arg-Lys), RD (Arg-Asp), RE (Arg-Glu), KK (Lys-Lys), KD (Lys-Asp), KE (Lys-Glu),DD (Asp-Asp), DE (Asp-Glu), EE (Glu-Glu); see main-text], yielding (cid:104) E (cid:105) KH , seq = − . − . − . − . − , respectively, for seq = WT, CS, FtoA, andRtoK. These average sequence-dependent mean-field non-electroatic interaction energies (cid:104) E (cid:105) KH , seq ’s are converted to the FH χ = ε h /T ∗ in Eq. (61) of Ref. 35 as follows:1. Convert per-mole units to per-interaction units: (cid:104) E (cid:105) KH , seq [(J/amino acid pair)]= (cid:110) (cid:104) E (cid:105) KH , seq [(kcal/mole of amino acid pairs)] /N A (cid:111) × / kcal × .
18 J / cal . (S67)2. Convert to the reduced variables used in analytical theory:( z/ × (cid:104) E (cid:105) KH , seq [(J/amino acid pair)] / ( k B T ) = − ε h /T ∗ , (S68)where T ∗ is the reduced temperature given by eq. (38) in Ref. 35 (see below) and z is a FH geometric factor representing the maximal number of monomers (aminoacid residues) that are spatial nearest neighbors to a given monomer; e.g., z = 6 forthree-dimensional simple cubic lattices. We obtain z/ . T in K to the reduced temperature T ∗ :1 T ∗ = e π(cid:15) (cid:15) r k B b T , (S69)where the electronic charge e = 1 . × − C, (cid:15) = 8 . × − C V − m, b = 3 . × − m, and (cid:15) r = 80, 40, or 20 in accordance with the corresponding9simulations with position- and IDR concentration-independent relative permittivities.Note that T ∗ = (cid:15) r T ∗ where T ∗ is defined after Eq. (S15) above and in Eq. (67) ofRef. 35.4. Convert (cid:104) E (cid:105) KH , seq to FH ε h :Based on the above consideration, ε h = − T ∗ (cid:16) z (cid:17) (cid:104) E (cid:105) KH , seq [(J/amino acid pair)] k B T = − (cid:18) π(cid:15) (cid:15) r be (cid:19) (cid:18) z (cid:19) × (cid:110) (cid:104) E (cid:105) KH , seq [(kcal/mole of amino acid pairs)] /N A (cid:111) × / kcal × .
18 J / cal= − π × (8 . × − ) × (3 . × − )(1 . × − ) . × × . . × ) × (cid:15) r (cid:104) E (cid:105) KH , seq [(kcal/mole of amino acid pairs)]= − . × (cid:15) r × (cid:104) E (cid:105) KH , seq [(kcal/mole of amino acid pairs)] . (S70)Accordingly, the ε h values for WT, CS, FtoA, and RtoK Ddx4 IDRs are, respectively, ε h =0 . . . .
364 when (cid:15) r = 80; ε h = 0 . . . .
182 when (cid:15) r = 40; and ε h = 0 . . . .
091 when (cid:15) r = 20.Note that ε h decreases with decreasing (cid:15) r because the reduced temperature T ∗ inEq. (S69) is proportional to (cid:15) r . In this formulation using T ∗ , the result of decreasing (cid:15) r isa reduction in the strength of favorable FH interactions relative to that of the electrostaticinteractions, which is equivalent to the physical situation (with temperature measured in K)of enhanced electrostatic interactions under a reduced (cid:15) r while keeping the non-electrostaticinteractions unchanged.0 Supporting Figures
WT:
MGDEDWEAEINPHMSSYVPIFEKDRYSGENGDNFNRTPASSSEMDDGPSRRDHFMKSGFASGRNFGNRDAGECNKRDNTSTMGGFGVGKSFGNRGFSNSRFEDGDSSGFWRESSNDCEDNPTRNRGFSKRGGYRDGNNSEASGPYRRGGRGSFRGCRGGFGLGSPNNDLDPDECMQRTGGLFGSRRPVLSGTGNGDTSQSRSGSGSERGGYKGLNEEVITGSGKNSWKSEAEGGES
CS:
MGDRDWRAEINPHMSSYVPIFEKDRYSGENGRNFNDTPASSSEMRDGPSERDHFMKSGFASGDNFGNRDAGKCNERDNTSTMGGFGVGKSFGNEGFSNSRFERGDSSGFWRESSNDCRDNPTRNDGFSDRGGYEKGNNSEASGPYERGGRGSFDGCRGGFGLGSPNNRLDPRECMQRTGGLFGSDRPVLSGTGNGDTSQSRSGSGSERGGYKGLNEKVITGSGENSWKSEARGGES
FtoA:
MGDEDWEAEINPHMSSYVPIAEKDRYSGENGDNANRTPASSSEMDDGPSRRDHAMKSGAASGRNAGNRDAGECNKRDNTSTMGGAGVGKSAGNRGASNSRAEDGDSSGAWRESSNDCEDNPTRNRGASKRGGYRDGNNSEASGPYRRGGRGSARGCRGGAGLGSPNNDLDPDECMQRTGGLAGSRRPVLSGTGNGDTSQSRSGSGSERGGYKGLNEEVITGSGKNSWKSEAEGGES
RtoK:
MGDEDWEAEINPHMSSYVPIFEKDKYSGENGDNFNKTPASSSEMDDGPSKKDHFMKSGFASGKNFGNKDAGECNKKDNTSTMGGFGVGKSFGNKGFSNSKFEDGDSSGFWKESSNDCEDNPTKNKGFSKKGGYKDGNNSEASGPYKKGGKGSFKGCKGGFGLGSPNNDLDPDECMQKTGGLFGSKKPVLSGTGNGDTSQSKSGSGSEKGGYKGLNEEVITGSGKNSWKSEAEGGES
Fig. S1:
The amino acid sequences (residues given by one-letter code) of the 236-residueDdx4 IDR (wildtype, WT) and its charge scrambled (CS) variant (introduced by Nott etal. ), phenylalanine-to-alanine variant (FtoA) (corresponds to the 14FtoA in Brady et al. and Vernon et al. ) and arginine-to-lysine (RtoK) variant considered in the present study.1 (a) Tyr (b) Phe (c) Trp Fig. S2:
Statistics of cation- π -like contacts. Distributions of C α –C α distance between apositively charged residue [arginine (solid curve) or lysine (dashed curve)] and an aromaticresidue [tyrosine (a), phenylalanine (b), or tryptophan (c)] are obtained from the samedataset of 6,943 high-resolution X-ray structures (from a non-redundant set ) used inFig. 2 of the main text. The bin size for C α –C α distance and the color code for differentresidue pairs are also identical to those in Fig. 2. For a given residue pair [Arg-Tyr,Lys-Tyr (a); Arg-Phe, Lys-Phe (b); or Arg-Trp, Lys-Trp (c)], the relative frequency ofa given C α –C α distance bin is the total number of instances in the dataset in whichthe C α –C α distance between the given pair of residues falls within the bin, normalized(divided) by the product of the two total numbers of residues in the dataset for the tworesidues making up the pair. Cumulative relative frequency at a given distance is the sumof relative frequencies for distances lower or equal to the given distance. Here, cumulativerelative frequencies are reported up to C α –C α distance of 6.5 ˚A, which is illustrative ofcommon criteria for a residue-residue contact. The plotted distributions show clearlythat arginine-aromatic contacts are consistently and significantly more numerous thanlysine-aromatic contacts when compared on the same footing, suggesting strongly thatthe overall arginine-aromatic interactions are energetically more favorable than the overalllysine-aromatic interactions. WT, 350 K CS, 300 KFtoA, 250 K RtoK, 300 K Fig. S3:
Verification of liquid-like dynamics of simulated condensed phases. As in Dignonet al., a relevant time-dependent mean-square deviation MSD( t ) of molecular coordinateswas simulated to provide evidence for liquid-like behavior in our model systems, viz., MSD( t ) = 1 n (cid:28) n (cid:88) µ =1 (cid:12)(cid:12)(cid:12)(cid:2) r µ, CM ( t + t ) − r CM ( t + t ) (cid:3) − (cid:2) r µ, CM ( t ) − r CM ( t ) (cid:3)(cid:12)(cid:12)(cid:12) (cid:29) t , where µ = 1 , , . . . , n labels the model IDR chains, n is the total number of IDR chainsin the simulation system, r µ, CM = (cid:80) Ni =1 m i r µi / (cid:80) Ni =1 m i is the center-of-mass position ofthe µ th chain, with m i being the mass of the i th bead (residue) along an IDR chain, r CM = (cid:80) nµ =1 r µ, CM /n is the center-of-mass of the entire collection of n chains, and the average isover the initial time point t . By subtracting drifts in molecular coordinates arising solelyfrom the diffusion of the entire system’s center of mass (see Fig. S4), the above-definedMSD( t ) values, which are provided by the circles in the plots, are a useful measure of theliquidity of our simulated system. Diffusion coefficients, D = { lim t →∞ d [MSD( t )] /dt } / (cid:15) r = 40 at the indicated temperatures, each of which is lower than the respective system’scritical temperature. The magnitudes of our simulated D s are similar to those simulated byDignon et al. for their model FUS systems (Fig. S12 of Ref. 51). Note that our simulated D s for the model Ddx4 IDR systems are, not unexpectedly, approximately three ordersof magnitude higher than the corresponding experimental values because a unphysicallylow friction coefficient was necessitated in our Langevin dynamics simulations in order toaccelerate sampling and also because a coarse-grained representation of the IDRs was used.3 Time(ns) M S D ( Å ) WT, 350 K CS, 300 KFtoA, 250 K RtoK, 300 K
Fig. S4:
Center-of-mass diffusion of the simulated Ddx4 IDR systems. Data are from thesame systems as those in Fig. S3. The solid curves provide the mean-square deviation ofthe center-of-mass positions of the IDRs without subtracting the the center-of-mass positionof the entire system, in which caseMSD( t ) = 1 n (cid:28) n (cid:88) µ =1 (cid:12)(cid:12)(cid:12) r µ, CM ( t + t ) − r µ, CM ( t ) (cid:12)(cid:12)(cid:12) (cid:29) t , whereas the dashed curves represent the diffusion of the center of mass of the entire systemof n IDRs, given by MSD( t ) = (cid:104)| r CM ( t + t ) − r CM ( t ) (cid:105) t . Echoing the findings in Fig. S3,a comparison of the solid and dashed curves in the present figure indicates that there issignificant diffusion of individual IDRs relative to the center of mass of the entire collectionof IDR chains.4
100 200 300 400 500
Concentration(mg/ml) † r Fig. S5:
Simulated IDR-concentration-dependent relative permittivity. Shown results—part of which are also provided in Fig. 6(a) of the main text—are for the WT Ddx4 IDR.Simulations were conducted using the SPC/E water model with 100 mM NaCl (circles),the TIP3P water model without salt (squares), and the TIP3P model with 100 mM NaCl(diamonds). Red symbols represent (cid:15) r values simulated using the full force field, whereasblue symbols denote (cid:15) r values simulated while the electric charges on the sidechains ofarginine, lysine, glutamic acid, and aspartic acid are artifically turned off. The (cid:15) r valuesplotted here are tabulated in Table S1.5 M o d e l T e m p e r a t u r e (a) = 80 (b) = 40 (c) = 20 Fig. S6:
Comparing analytical theory with simulation for sequence-dependent liquid-liquidphase separation of model Ddx4 systems. Phase diagrams simulated using the explicit-chainKH model under different permittivities ( (cid:15) r ) for the four Ddx4 IDRs from Fig. 4 of the maintext are replotted here as dashed curves. Predicted phase diagrams by the RPA+FH theorythat afford the best overall fit, at z/ .
3, are shown as solid curves.6
Table S1:
IDR-concentration-dependent relative permittivity, (cid:15) r , simulated for WT Ddx4IDR using the SPC/E and TIP3P atomic models of water at T = 300 K. SPC/E + salt b TIP3P, no salt TIP3P + salt b [Ddx4] a (cid:15) r [Ddx4] a (cid:15) r [Ddx4] a (cid:15) r (52.04) (71.5) (51.1) (92.8) (52.7) (89.5) (103.6) (61.5) (103.0) (85.9) (104.9) (85.8) (207.3) (55.0) (206.5) (78.2) (209.9) (76.4) (306.4) (49.2) (315.0) (70.5) (311.0) (70.5) (408.7) (45.0) (424.6) (54.9) (413.3) (59.3) (536.9) (36.1) (543.8) (47.1) (543.7) (47.3) a Concentrations (in mg/ml) and simulated (cid:15) r values given in bold font are for systemsthat apply the full force field; those given in ordinary roman (non-bold) font and in paren-theses are for systems in which the electric charges on the sidechains of arginine, lysine,glutamic acid, and aspartic acid of WT Ddx4 IDR are artifically turned off. b [NaCl] = 100 mM.eferences 47 References Brangwynne, C. P.; Eckmann, C. R.; Courson, D. S.; Rybarska, A.; Hoege, C.;Gharakhani, J.; J¨ulicher, F.; Hyman, A. A. Germline P granules are liquid dropletsthat localize by controlled dissolution/condensation.
Science , , 1729–1732. Li, P.; Banjade, S.; Cheng, H. C.; Kim, S.; Chen, B.; Guo, L.; Llaguno, M.;Hollingsworth, J. V.; King, D. S.; Banani, S. F.; Russ, P. S.; Jiang, Q.-X.; Nixon,B. T.; Rosen, M. K. Phase transitions in the assembly of multivalent signalling pro-teins.
Nature , , 336–340. Kato, M.; Han, T. W.; Xie, S.; Shi, K.; Du, X.; Wu, L. C.; Mirzaei, H.; Goldsmith, E. J.;Longgood, J.; Pei, J.; Grishin, N. V.; Frantz, D. E.; Schneider, J. W.; Chen, S.; Li, L.;Sawaya, M. R.; Eisenberg, D.; Tycko, R.; McKnight, S. l. Cell-free formation of RNAgranules: low complexity sequence domains form dynamic fibers within hydrogels.
Cell , , 753–767. Nott, T. J.; Petsalaki, E.; Farber, P.; Jervis, D.; Fussner, E.; Plochowietz, A.;Craggs, T. D.; Bazett-Jones, D. P.; Pawson, T.; Forman-Kay, J. D. Phase transi-tion of a disordered nuage protein generates environmentally responsive membranelessorganelles.
Mol. Cell , , 936–947. Molliex, A.; Temirov, J.; Lee, J.; Coughlin, M.; Kanagaraj, A. P.; Kim, H. J.; Mit-tag, T.; Taylor, J. P. Phase separation by low complexity domains promotes stressgranule assembly and drives pathological fibrillization.
Cell , , 123–133. Lin, Y.; Protter, D. S. W.; Rosen, M. K.; Parker, R. Formation and maturation ofphase-separated liquid droplets by RNA-binding proteins.
Mol. Cell , , 208–219. Zeng, M.; Chen, X.; Guan, D.; Xu, J.; Wu, H.; Tong, P.; Zhang, M. J. Reconstitutedpostsynaptic density as a molecular platform for understanding synapse formation andplasticity.
Cell , , 1172–1187. Muiznieks, L. D.; Sharpe, S.; Pom`es, R.; Keeley, F. W. Role of liquidliquid phaseseparation in assembly of elastin and other extracellular matrix proteins.
J. Mol. Biol. , , 4741–4753. Nedelsky, N. B.; Taylor, J. P. Bridging biophysics and neurology: Aberrant phasetransitions in neurodegenerative disease.
Nat. Rev. Neurol. , , 272–286.eferences 48 Banani, S. F.; Lee, H. O.; Hyman, A. A.; Rosen, M. K. Biomolecular condensates:organizers of cellular biochemistry.
Nat. Rev. Mol. Cell Biol. , , 285–298. Gomes, E., and Shorter, J. The molecular language of membraneless organelles.
J. Biol.Chem. , , 7115–7127. Stewart, R. J.; Wang, C. S.; Song, I. T.; Jones, J. P. The role of coacervation and phasetransitions in the sandcastle worm adhesive system
Adv. Colloid Interf. Sci. , ,88–96. Chen, X.; Wu, X.; Wu, H.; Zhang, M. Phase separation at the synapse.
Nat. Neurosci. , , 301–310. Gibson, B. A.; Dootlittle, L. K.; Schneider, M. W. G.; Jensen, L. E.; Gamarra, N.;Henry, L.; Gerlich, D. W.; Redding, S.; Rosen, M. K. Organization of chromatin byintrinsic and regulated phase separation.
Cell , , 470–484. Tsang, B.; Arsenault, J.; Vernon, R. M.; Lin, H.; Sonenberg, N.; Wang, L. Y.; Bah,A.; Forman-Kay, J. D. Phosphoregulated FMRP phase separation models activity-dependent translation through bidirectional control of mRNA granule formation.
Proc.Natl. Acad. Sci. U.S.A. , , 4218–4227. Kim, T. H.; Tsang, B.; Vernon, R. M.; Sonenberg, N.; Kay, L. E.; Forman-Kay, J. D.Phospho-dependent phase separation of FMRP and CAPRIN1 recapitulates regulationof translation and deadenylation.
Science , , 825–829. Wong, L. E.; Bhatt, A.; Erdmann, P. S.; Hou, Z.; Maier, J.; Pirkuliyeva, S.; Engelke,M.; Becker, S.; Plitzko, J.; Wienands, J.; Griesinger, C. Tripartite phase separation oftwo signal effectors with vesicles priming B cell responsiveness.
Nat. Comm. , ,848. Fujioka, Y.; Alam, J. M.; Noshiro, D.; Mouri, K.; Ando, T.; Okada, Y.; May, A. I.;Knorr, R. L.; Suzuki, K.; Ohsumi, Y.; Noda, N. N. Phase separation organizes the siteof autophagosome formation.
Nature , , 301–305. Shin, Y.; Brangwynne, C.P. Liquid phase condensation in cell physiology and disease.
Science , , eaaf4382. Lin, Y.-H.; Forman-Kay, J. D.; Chan, H. S. Theories for sequence-dependent phasebehaviors of biomolecular condensates.
Biochemistry , , 2499–2508. Boeynaems, S.; Alberti, S.; Fawzi, N. L.; Mittag, T.; Polymenidou, M.; Rousseau, F.;Schymkowitz, J.; Shorter, J.; Wolozin, B.; Van Den Bosch, L.; Tompa, P.; Fuxreiter,M. Protein phase separation: a new phase in cell biology.
Trends Cell Biol. , ,420–435.eferences 49 Vernon, R. M.; Forman-Kay, J. D. First-generation predictors of biological proteinphase separation.
Curr. Opin. Struct. Biol. , , 88–96. Ruff, K. M.; Pappu, R. V.; Holehouse, A. S. Conformational preferences and phasebehavior of intrinsically disordered low complexity sequences: insights from multiscalesimulations.
Curr. Opin. Struct. Biol. , , 1–10. Perry, S. L. Phase separation: Bridging polymer physics and biology.
Curr. Opin.Colloid Interface Sci. , , 86–97. Dignon, G. L.; Zheng, W.; Mittal, J. Simulation methods for liquid-liquid phase sepa-ration of disordered proteins
Curr. Opin. Chem. Eng. , , 92–98. Alberti, S.; Gladfelter, A.; Mittag, T. Considerations and challenges in studying liquid-liquid phase separation and biomolecular condensates.
Cell , , 419–434. Cinar, H.; Fetahaj, Z.; Cinar, S.; Vernon, R. M.; Chan, H. S.; Winter, R. Temperature,hydrostatic pressure, and osmolyte effects on liquid-liquid phase separation in proteincondensates: Physical chemistry and biological implications.
Chem. Eur. J. , ,13049–13069. Choi, J.-M.; Holehouse, A. S.; Pappu, R. V. Physical principles underlying the com-plex biology of intracellular phase transitions.
Annu. Rev. Biophys. , , doi:10.1146/annurev-biophys-121219-081629. Chong, P. A.; Forman-Kay, J. D. Liquid-liquid phase separation in cellular signalingsystems.
Curr. Opin. Struct. Biol. , , 180–186. Li, X.-H.; Chavali, P. L.; Pancsa, R.; Chavali, S.; Babu, M. M. Function and regulationof phase-separated biological condensates.
Biochemistry , , 2452–2461. Lee, C. F.; Wurtz, J. D. Novel physics arising from phase transitions in biology.
J.Phys. D: Appl. Phys. , , 023001. Hyman, A. A.; Weber, C. A.; J¨ulicher, F. Liquid-liquid phase separation in biology.
Annu. Rev. Cell Dev. Biol. , , 39–58. Brangwynne, C. P.; Tompa, P.; Pappu, R. V. Polymer physics of intracellular phasetransitions.
Nat. Phys. , , 899–904. Lin, Y.-H.; Forman-Kay, J. D.; Chan, H. S. Sequence-specific polyampholyte phaseseparation in membraneless organelles.
Phys. Rev. Lett. , 2016, , 178101. Lin, Y.-H.; Song, J.; Forman-Kay, J. D.; Chan, H. S. Random-phase-approximationtheory for sequence-dependent, biologically functional liquid-liquid phase separation ofintrinsically disordered proteins.
J. Mol. Liq. , 2017, , 176–193.eferences 50 Lin, Y.-H.; Chan, H. S. Phase separation and single-chain compactness of chargeddisordered proteins are strongly correlated.
Biophys, J. , , 2043–2046. Lin, Y. -H.; Brady, J. P.; Forman-Kay J. D.; Chan, H. S. Charge pattern matching as afuzzymode of molecular recognition for the functional phase separations of intrinsicallydisordered proteins.
New J. Phys. , , 115003. Chang, L.-W.; Lytle, T. K.; Radhakrishna, M.; Madinya, J. J.; V´elez, J.; Sing, C. E.;Perry, S. L. Sequence and entropy-based control of complex coacervates.
Nat. Comm. , , 1273. Wang, J.; Choi, J. M.; Holehouse, A. S.; Lee, H. O.; Zhang, X.; Jahnel, M.;Maharana,S.; Lemaitre, R.; Pozniakovsky, A.; Drechsel, D.; Poser, I.; Pappu, R. V.; Alberti, S.;Hyman, A. A. A molecular grammar governing the driving forces for phase separationof prion-like RNA binding proteins.
Cell , , 688–699. Schmit, J. D.; Bouchard, J. J.; Martin, E. W.; Mittag, T. Protein network structureenables switching between liquid and gel states.
J. Am. Chem. Soc. , , 874–883. Lin, Y.-H.; Brady, J. P.; Chan, H. S.; Ghosh, K. A unified analytical theory ofheteropolymers for sequence-specific phase behaviors of polyelectrolytes and polyam-pholytes.
J. Chem. Phys. , , 045102. Kosik, K. S.; Fredrickson, G. H.; Shea, J.-E.; Han, S. Narrow equilibrium window forcomplex coacervation of tau and RNA under cellular conditions eLife , , e42571. McCarty, J.; Delaney, K. T.; Danielsen, S. P. O.; Fredrickson, G. H.; Shea, J.-E.Complete phase diagram for liquid-liquid phase separation of intrinsically disorderedproteins.
J. Phys. Chem. Lett. , , 1644–1652. Danielsen, S. P. O.; McCarty, J.; Shea, J.-E.; Delaney, K. T.; Fredrickson, G. H.Molecular design of self-coacervation phenomena in block polyampholytes.
Proc. Natl.Acad. Sci. U.S.A. , , 8224–8232. Danielsen, S. P. O.; McCarty, J.; Shea, J.-E.; Delaney, K. T.; Fredrickson, G. H. Smallion effects on self-coacervation phenomena in block polyampholytes.
J. Chem. Phys. , , 034904. Das, S.; Eisen, A.; Lin, Y.-H.; Chan, H. S. A lattice model of charge-pattern-dependentpolyampholyte phase separation.
J. Phys. Chem. B , , 5418–5431. Robichaud,N. A. S.; Saika-Voivod, I.; Wallin, S. Phase behavior of blocky charge latticepolymers: Crystals, liquids, sheets, filaments, and clusters.
Phys. Rev. E. , ,052404.eferences 51 Choi, J.-M.; Dar, F.; Pappu, R. V. LASSI: A lattice model for simulating phase tran-sitions of multivalent proteins.
PLoS Comput. Biol. , , e1007028. Dignon, G. L.; Zheng, W.; Kim, Y. C.; Mittal, J. Temperature-controlled liquid-liquidphase separation of disordered proteins.
ACS Cent. Sci. , , 821–830. Nilsson, D.; Irb¨ack, A. Finite-size scaling analysis of protein droplet formation.
Phys.Rev. E , , 022413. Dignon, G. L.; Zheng, W.; Kim, Y. C.; Best, R. B.; Mittal, J. Sequence determinantsof protein phase behavior from a coarse-grained model.
PLoS Comput. Biol. , ,e1005941. Das, S.; Amin, A. N.; Lin, Y.-H.; Chan, H. S. Coarse-grained residue-based modelsof disordered protein condensates: Utility and limitations of simple charge patternparameters.
Phys. Chem. Chem. Phys. , , 28558–28574. Statt, A.; Casademunt, H.; Brangwynne, C. P.; Panagiotopoulos, A. Z. Model fordisordered proteins with strongly sequence-dependent liquid phase behavior.
J. Chem.Phys. , , 075101. Dignon, G. L.; Zheng, W.; Best, R. B.; Kim, Y. C.; Mittal, J. Relation between single-molecule properties and phase behavior of intrinsically disordered proteins.
Proc. Natl.Acad. Sci. U.S.A. , , 9929–9934. Ruff, K. M.; Harmon, T. S.; Pappu, R. V. CAMELOT: A machine learning approachfor coarse-grained simulations of aggregation of block-copolymeric protein sequences.
J. Chem. Phys. , , 243123. Harmon, Y. S.; Holehouse, A. S.; Pappu, R. V. Differential solvation of intrinsically dis-ordered linkers drives the formation of spatially organized droplets in ternary systemsof linear multivalent proteins.
New J. Phys. , , 045002. Nguemaha, V.; Zhou, H.-X. Liquid-liquid phase separation of patchy particles illumi-nates diverse effects of regulatory components on protein droplet formation.
Sci. Rep. , , 6728. Ghosh, A.; Mazarakos, K.; Zhou, H.-X. Three archetypical classes of macromolecularregulators of protein liquid-liquid phase separation.
Proc. Natl. Acad. Sci. U.S.A. , , 19474-19483. Martin, E. W.; Holehouse, A. S.; Peran, I.; Farag, M.; Incicco, J. J.; Bremer, A.; Grace,C. R.; Soranno, A.; Pappu, R. V.; Mittag, T. Valence and patterning of aromaticresidues determine the phase behavior of prion-like domains.
Science , , 694–699.eferences 52 Cinar, H.; Cinar, S.; Chan, H. S.; Winter, R. Pressure-induced dissolution and reentrantformation of condensed, liquid-liquid phase-separated elastomeric α -elastin. Chem.Eur. J. , 8286–8291. Cinar, S.; Cinar, H.; Chan, H. S.; Winter, R. Pressure-sensitive and osmolyte-modulated liquid-liquid phase separation of eye-lens γ -crystallins. J. Am. Chem. Soc. , , 7347–7354. Cinar, H.; Oliva, R.; Lin, Y.-H.; Chen, X.; Zhang, M.; Chan, H. S.; Winter, R. Pres-sure sensitivity of SynGAP/PSD95 condensates as a model for postsynaptic densi-ties and its biophysical and neurological ramifications.
Chem. Eur. J. doi:10.1002/chem.201905269 Brady, J. P.; Farber, P. J.; Sekhar, A.; Lin, Y.-H.; Huang, R.; Bah, A.; Nott, T. J.;Chan, H. S.; Baldwin, A. J.; Forman-Kay, J. D.; Kay, L. E. Structural and hydrody-namic properties of an intrinsically disordered region of a germ cell-specific protein onphase separation.
Proc. Natl. Acad. Sci. U.S.A. , , E8194–E8203. Roberts, S.; Miao, V.; Costa, S.; Simon, J.; Kelly, G.; Shah, T.; Zauscher, S;, Chilkoti,A. Complex microparticle architectures from stimuli-responsive intrinsically disorderedproteins.
Nat. Comm . , , 1342. Salonen, L. M.; Ellermann, M.; Diederich, F. Aromatic rings in chemical and biologicalrecognition: energetics and structures.
Angew. Chem. Int. Ed. , , 4808–4842. Vernon, R. M.; Chong, P. A.; Tsang, B.; Kim, T. H.; Bah, A.; Farber, P.; Lin, H.;Forman-Kay, J. D. Pi-Pi contacts are an overlooked protein feature relevant to phaseseparation. eLife , , e31486. Gallivan, J. P.; Dougherty, D. A. Cation- π interactions in structural biology. Proc.Natl. Acad. Sci. U.S.A. , , 9459–9464. Gallivan, J. P.; Dougherty, D. A. A computational study of cation- π interaction vs saltbridges in aqueous media: Implications for protein engineering. J. Am. Chem. Soc . , , 870–874. Crowley, P. B.; Golovin, A. Cation- π interactions in proteinprotein interfaces. Proteins , , 231–239. Song, J.; Ng, S. C.; Tompa, P.; Lee, K. A. W.; Chan, H. S. Polycation- π interactionsare a driving force for molecular recognition by an intrinsically disordered oncoproteinfamily PLoS Comput. Biol . , , e1003239.eferences 53 Chen, T.; Song, J.; Chan, H. S. Theoretical perspectives on nonnative interactions andintrinsic disorder in protein folding and binding.
Curr. Opin. Struct. Biol. , ,32–42. Miyazawa, S.; Jernigan, R. L. Estimation of effective interresidue contact energiesfrom protein crystal structures: quasi-chemical approximation.
Macromolecules , , 534–552. Miyazawa, S.; Jernigan, R. L. Residue-residue potentials with a favourable contact pairterm and an unfavourable high packing density term, for simulation and threading.
J.Mol. Biol . , , 623644. Kapcha, L. H,; Rossky, P. J. A simple atomic-level hydrophobicity scale reveals proteininterfacial structure.
J. Mol. Biol. , , 484–498. Silmore, K. S.; Howard, M. P.; Panagiotopoulos, A. Z. Vapor-liquid equilibrium andsurface tension of fully flexible Lennard-Jones chains.
Mol. Phys. , , 320–327. Kim, Y. C.: Hummer, G. Coarse-grained models for simulation of multiprotein com-plexes: Application to ubiquitin binding.
J. Mol. Biol. , , 1416–1433. Murthy, A. C.; Dignon, G. L.; Kan, Y.; Zerze, G. H.; Parekh, S. H.; Mittal, J.; Fawzi,N. L. Molecular interactions underlying liquid-liquid phase separation of the FUS low-complexity domain.
Nat. Struct. Mol. Biol. , , 637–648. Conicella, A. E.; Dignon, G. L.; Zerze, G. H.; Schmidt, H. B.; D’Ordine, A. M.; Kim,Y. C.; Rohatgi, R.; Ayala, Y. M.; Mittal, J.; Fawzi, N. L. TDP-43 -helical structuretunes liquidliquid phase separation and function.
Proc. Natl. Acad. Sci. U.S.A. , , 5883–5894. Schuster, B. S.; Dignon, G. L.; Tang, W. S.; Kelley, F. M.; Ranganath, A. K.; Jahnke,C. N.; Simplins, A. G.; Regy, R. M.; Hammer, D. A.; Good, M. C.; Mittal, J. Identifyingsequence perturbations to an intrinsically disordered protein that determine its phaseseparation behavior. bioRxiv doi: https://doi.org/10.1101/2020.01.06.894576 (2020). Quiroz, F. G.; Chilkoti, A. Sequence heuristics to encode phase behaviour in intrinsi-cally disordered protein polymers.
Nat. Mater. , , 1164–1171. Kumar, K.; Woo, S. M.; Siu, T.; Cortopassi, W. A.;Duarte, F.; Paton, R. S. Cation- π interaction in protein-ligand binding: theory and data-mining reveal different roles forlysine and arginine. Chem. Sci. , , 2655–2665. DeVido, D. R.; Dorsey, J. G.; Chan, H. S.; Dill, K. A. Oil/water partitioning has adifferent thermodynamic signature when the oil solvent chains are aligned than whenthey are amorphous.
J. Phys. Chem. B , , 7272–7279.eferences 54 Karplus, P. A. Hydrophobicity regained.
Protein Sci. , , 1302–1307. Chan, H. S.; Dill, K. A. Solvation: How to obtain microscopic energies from partitioningand solvation experiments.
Annu. Rev. Biophys. Biomol. Struct. , , 425–459. Chan, H. S. Amino acid sidechain hydrophobicity. In: eLS, Encyclopedia of LifeSciences Fauch`ere, J. L.; Pliˇska, V. Hydrophobic parameters of amino-acid side chains from thepartitioning of N-acetyl-amino-acid amides.
Eur. J. Med. Chem . , , 369–375. Chan, H. S. Folding alphabets.
Nat. Struct. Biol. , , 994–997. Dwyer, J. J.; Gittis, A. G.; Karp, D. A.; Lattman, E. E.; Spencer, D. S.; Stites, W.E.; Garc´ıa-Moreno E, B. High apparent dielectric constants in the interior of a proteinreflect water penetration.
Biophys. J. , , 1610–1620. Pitera, J. W.; Falta, M.; van Gunsteren, W. F. Dielectric properties of proteins fromsimulation: The effects of solvent, ligands, pH, and temperature.
Biophys. J . , ,2546-2555. Schutz, C. N.; Warshel, A. What are the dielectric “constants” of proteins and how tovalidate electrostatic models?
Proteins , , 400–417. Rudas, T.; Schr¨oder, C.; Boresch, S.; Steinhauser, O. Simulation studies of the protein-water interface. II. Properties at the mesoscopic resolution.
J. Chem. Phys . , ,234908. Nymeyer, H.; Zhou, H.-X. A method to determine dielectric constants in nonhomoge-neous systems: Application to biological membranes.
Biophys. J. , , 1185–1193. Tros, M.; Zheng, L.; Hunger, J.; Bonn, M.; Bonn, D.; Smits, G. J.; Woutersen, S.Picosecond orientational dynamics of water in living cells.
Nat. Commun . , ,904. Wyman, Jr., J. Studies on the dielectric constant of protein solutions. I. Zein.
J. Biol.Chem. , , 443–476. Martin, N. Dynamic synethic cells based on liquid-liquid phase separation.
Chem-BioChem , , 2553–2568. Abraham, M. J.; van der Spoel, D.; Lindahl, E.; Hess, B. and the GROMACS develop-ment team.
GROMACS User Manual version 2016.5 Lindorff-Larsen, K.; Piana, S.; Palmo, K.; Maragakis, P.; Klepeis, J. L.; Dror, R. O.;Shaw, D. E. Improved side-chain torsion potentials for the Amber ff99SB protein forcefield.
Proteins , , , 1950–1958. Jorgensen, W. L.; Chandrasekhar, J.; Madura, J. D.; Impey, R. W.; Klein, M. L.Comparison of simple potential functions for simulating liquid water.
J. Chem. Phys . , , 926–935. Berendsen, H. J. C.; Grigera, J. R.; Straatsma, T.P. The missing term in effective pairpotentials.
J. Phys. Chem. , , 6269–6271. Humphrey, W.; Dalke, A.; Schulten, K. VMD - Visual Molecular Dynamics.
J. Molec.Graphics , , , 33–38. Hsin, J.; Arkhipov, A.; Yin, Y.; Stone, J. E.; Schulten, K. Using VMD: an introductorytutorial.
Curr. Protoc. Bioinform. , , 5.7.1-5.7.48. Bragg, W. L.; Pippard, A. B. The form birefringence of macromolecules.
Acta Cryst . , , 865-867. Hasted, J. B.; Ritson, D. M.; Collie, C. H. Dielectric properties of aqueous ionic solu-tions. Parts I and II.
J. Chem. Phys. , , 1–21. Levy, A.; Andelman, D.; Orland, H. Dielectric constant of ionic solutions: A field-theory approach.
Phys. Rev. Lett. , , 227801. Wang, Z.-G. Fluctuation in electrolyte solutions: The self energy.
Phys. Rev. E , , 021501. Orkoulas, G.; Kumar, S. K.; Panagiotopoulos, A. Z. Monte Carlo study of Coulombiccriticality in polyelectrolytes.
Phys. Rev. Lett. , , 048303. Schmit, J. D.; Dill, K. A. The stabilities of protein crystals.
J. Phys. Chem. B , , 4020–4027. Martyna, G. J.; Tobias, D. J.; Klein, M. L.; Constant pressure molecular dynamicsalgorithms.
J. Chem. Phys. , , 4177–4189. Tuckerman, M. E.; Alejandre, J.; L´opez-Rend´on, R.; Jochim , A. L.; Martyna G.J. A Liouville-operator derived measure-preserving integrator for molecular dynamicssimulations in the isothermalisobaric ensemble.
J. Phys. A: Math. Gen. , ,5629–5651. Anderson, J. A.; Lorenz, C. D.; Travesset, A. General purpose molecular dynamicssimulations fully implemented on graphics processing units.
J. Comput. Phys. , , 5342–5359.eferences 56 Glaser, J.; Nguyen, T. D.; Anderson, J. A.; Liu, P.; Spiga, F.; Millan, J. A.; Morse, D.C.; Glotzer, S. C. Strong scaling of general-purpose molecular dynamics simulations onGPUs.
Comput. Phys. Commun. , , 97–107. DeLano W. L.
The PyMOL Molecular Graphics System
Mart´ınez, L.; Andrade, R.; Birgin, E. G.; Mart´ınez. J. M. PACKMOL: A package forbuilding initial configurations for molecular dynamics simulations.
J. Comput. Chem . , , 2157–2164. Hess, B.; Bekker, H.; Berendsen, H. J. C.; Fraaije, J. G. E. M. LINCS: A linearconstraint solver for molecular simulations.
J. Comp. Chem . , , 1463–1472. Hockney, R. W.; Goel, S. P.; Eastwood, J. Quiet high resolution computer models of aplasma.
J. Comp. Phys . , , 148–154. Essmann, U.; Perera, L.; Berkowitz, M. L.; Darden, T.; Lee, H.; Pedersen, L. G. Asmooth particle mesh Ewald potential.
J. Chem. Phys . , , 8577–8592. Bussi, G.; Donadio, D.; Parrinello, M. Canonical sampling through velocity rescaling.
J. Chem. Phys . , , 014101. Berendsen, H. J. C.; Postma, J. P. M.; DiNola, A.; Haak, J. R. Molecular dynamicswith coupling to an external bath.
J. Chem. Phys . , , 3684-3690. Parrinello, M.; Rahman, A. Polymorphic transitions in single crystals: A new moleculardynamics method.
J. Appl. Phys . , , 7182–7190. Nos´e, S.; Klein, M. L. Constant pressure molecular dynamics for molecular systems.
Mol. Phys . , , 1055–1076. Floros, S.; Liakopoulou-Kyriakides, M.; Karatasos, K.; Papadopoulos, G. E. Detailedstudy of the dielectric function of a lysozyme solution studied with molecular dynamicssimulations.
Eur. Biophys. J . , , 599–611. Jackson, J. D.
Classical Electrodynamics
Markel, V. A. Introduction to the Maxwell Garnett approximation: Tutorial
J. Opt.Soc. Am. A , , 1244–1256. Wang, Q.; Taniguchi, T.; Fredrickson, G. H. Self-consistent field theory of polyelec-trolyte systems.
J. Phys. Chem. B , , 6733–6744. Allen, M. P.; Tildesley, D. J.