A Lattice Model of Charge-Pattern-Dependent Polyampholyte Phase Separation
11January 12, 2018
A Lattice Model of Charge-Pattern-DependentPolyampholyte Phase Separation
Suman D AS , Adam E
ISEN , , , Yi-Hsuan L IN , , and Hue Sun C
HAN , , ∗ Department of Biochemistry, University of Toronto, Toronto, Ontario M5S 1A8, Canada; Department of Molecular Genetics, University of Toronto,Toronto, Ontario M5S 1A8, Canada; Department of Mathematics & Statistics, Queen’s UniversityKingston, Ontario K7L 3N6, Canada; and Molecular Medicine, Hospital for Sick Children, Toronto, Ontario M5G 0A4, Canada ∗ Corresponding authorE-mail: [email protected]; Tel: (416)978-2697; Fax: (416)978-8548Mailing address:Department of Biochemistry, University of Toronto, Medical Sciences Building – 5th Fl.,1 King’s College Circle, Toronto, Ontario M5S 1A8, Canada. a r X i v : . [ q - b i o . B M ] M a r Abstract
In view of recent intense experimental and theoretical interests in the biophysics of liquid-liquid phase separation (LLPS) of intrinsically disordered proteins (IDPs), heteropolymermodels with chain molecules configured as self-avoiding walks on the simple cubic latticeare constructed to study how phase behaviors depend on the sequence of monomers alongthe chains. To address pertinent general principles, we focus primarily on two fully charged50-monomer sequences with significantly different charge patterns. Each monomer in ourmodels occupies a single lattice site and all monomers interact via a screened pairwiseCoulomb potential. Phase diagrams are obtained by extensive Monte Carlo samplingperformed at multiple temperatures on ensembles of 300 chains in boxes of sizes rangingfrom 52 × ×
52 to 246 × ×
246 to simulate a large number of different systemswith the overall polymer volume fraction φ in each system varying from 0 .
001 to 0 . T cr , of phase separationfor the two sequences differ significantly, whereby the sequence with a more “blocky”charge pattern exhibits a substantially higher propensity to phase separate. The trendis consistent with our sequence-specific random-phase-approximation (RPA) polymertheory; but the variation of the simulated T cr with a previously proposed “sequence chargedecoration” pattern parameter is milder than that predicted by RPA. Ramifications of ourfindings for the development of analytical theory and simulation protocols of IDP LLPSare discussed. Introduction
A central principle of modern biology is that of information . Much of the study of molec-ular biology aims to ascertain how information embodied in specific sequences of nucleicacids and proteins govern their structures and interactions to serve various physiologicalfunctions. Molecular biology of proteins used to focus predominantly on the sequence-structure relationships of globular proteins with highly ordered folded structures. In re-cent years, however, it has become abundantly clear that intrinsically disordered proteins(IDPs) serve many critical functions, especially those pertinent to cellular signaling andregulation . Unlike globular proteins that fold to an essentially unique structure underphysiological conditions, IDPs do not fold by themselves. In the absence of stabilizinginteractions with other biomolecules, an IDP can adopt many different structures, i.e., itpopulates a conformational ensemble. Nonetheless, physics dictates that the conformationaldistribution in an IDP ensemble is sequence-dependent, not random. Accordingly, to deci-pher IDPs function biophysically, it is necessary to extend our interest in sequence-structurerelationships to a more generalized pursuit of sequence-ensemble relationships.Some IDPs function not merely via binding interactions that lead to formation of dis-crete molecular complexes . An increasing number of IDPs have now been known tofunction also at a mesoscopic level by forming droplet-like condensates via liquid-liquidphase separation so as to regulate/stimulate specific set of biochemical reactions. Electro-static interactions often figure prominently in these phase separation processes (sometimesreferred to as coacervation); but other types of interactions, especially cation- π and π - π interactions , can also play significant roles in enabling such functional IDP phase behav-iors that, when dysfunctional because, e.g., of mutations of the IDP sequences, can lead toa broad spectrum of diseases . Examples of membraneless organelles—intracellular com-partments and subcompartments not bound by lipid membranes—that are underpinned byIDP phase separations include nuage or germ granules , the nucleolus which is the site ofribosome assembly in the nucleus , and stress granules triggered by heat stress . SomeIDP condensates exhibit liquid-like hydrodynamic properties , others are observed to begel-like , or “mature” over time to a state with slower dynamic exchange , including thedevelopment of a differentially stabilized core substructure in stress granules in a processthat shares certain resemblance to earlier observations of maturation of coacervated elastindroplets into fibrillar structures . In the case of stress granules, formation of liquid-likedroplets can also be a precursor to pathological fibrillization .IDP phase behaviors are sequence dependent. Gaining physical insights into this se-quence dependence is important for progress not only in molecular biology but also inmaterials science . For an intrinsically disordered region of the DEAD-box RNA heli-case Ddx4—the phase separation of which underlies nuage or germ granules, it has beenshown experimentally that the wildtype sequence phase separates in vitro and in cells,whereas a charge-scrambled variant of the sequence with the same composition of aminoacid residues but a different sequential charge pattern does not . In contrast, althoughthe phase behaviors of the Nephrin intracellular domain (NICD) is sequence dependent,the effects of the overall amino acid composition is sufficiently overwhelming that the “pre-cise sequence of NICD appears to matter little” . To address sequence-dependent IDPphase properties, our group has put forth an analytical formulation based on the random-phase-approximation (RPA) polymer theory for electrostatic interactions , affording thefirst quantitative physical rationalization of the different phase behaviors of Ddx4 and itscharge-scrambled variant . The theory predicts in general that the propensity of apolyampholytic IDP sequence with zero net charge to phase separate is correlated withthe “blockiness” and perhaps other yet-unspecified attributes of its charge pattern thatare captured by the κ (ref. 31) and “sequence charge decoration” (SCD) parameters .The same theory further stipulates that whether the solute populations of two differentcharged IDP sequences (with zero net charge) present in the same aqueous solution demixupon phase separation is largely governed by the difference in their SCD parameters .Independently, a recent “hybrid” formulation that combines Monte Carlo chain simula-tions with a Flory-Huggins-like theory tackles how coacervation involving multiple copiesof a homopolyanion and a polycation with only half of the monomers charged depends onthe sequence of the polycation. Consistent with experiment, the formulation predicts thatthe tendency to coacervate increases with the blockiness of the polycation . To furthersubstantiate these theoretical/conceptual advances, it is now imperative to assess the ap-proximations that have been invoked to make analytical theories tractable, preferably bydirect simulations of explicit-chain models. However, despite notable advances in modelingIDP folding upon binding, using explicit-chain simulation to study IDPs properties in gen-eral is still in its infancy , especially for phase behaviors which entail sampling a largenumber of chain molecules. The immense computational cost required dictates that onlycoarse-grained chain models are currently feasible to be used for phase separation simula-tions. Insights have been gained, for example, by treating groups of amino acid residues ofIDPs as interaction modules . As a step toward better quantitative understanding, herewe develop a simple lattice model to address sequence dependence of IDP phase behaviorat the monomer/residue level.Simple lattice models have made critical contributions to polymer science, beginningat least 70 years ago with the work of Orr in 1947 . Early exact enmuerations of chainconformations have been instrumental in fundamental developments in polymer theory .Lattice models also helped advance studies of micelles (by considering diblock sequencesas models for amphiphiles) , knot theory , RNA conformational statistics , and DNAtopology . As far as proteins are concerned, lattice modeling was pioneered by G¯o andcoworkers. The structure-based approach they introduced in 1975 (ref. 49) and pursuedtill 1988 (ref. 50) has led to fundamental conceptual advances, including recognizing theimportance of local interactions in speeding up folding , role of nonlocal interactions infolding cooperativity , and the celebrated “consistency principle” which is intimatelyrelated to the subsequent “principles of minimal frustration” of protein folding .Structure-based G¯o-like models do not consider the physico-chemical basis of sequencedependence . Physics-based sequence dependence was first introduced into lattice studiesof proteins in 1989 by Lau and Dill’s 2-letter hydrophobic-polar (HP) model , which is anexplicit-chain version of an earlier mean-field HP model of Dill . This construct providesa simple tractable model of the protein sequence-structure mapping that could readily beexplored algorithmically to study protein folding and evolution . At the same time,lattice models were used to address local preference and global compactness of chainconformations as well as the ramifications of their relationships for protein structures . Thisendeavor led to the first exact enumeration of all compact conformations of a 27mer config-ured within a 3 × × , a noteworthy model that was subsequently utilized in seminalinvestigations of the kinetic bottlenecks , the funnel picture , the landscape perspective ,and cooperativity of protein folding as well as the designability/encodability of proteinstructures . During this period, sequence dependence of protein behaviors was also in-vestigated using 20-letter models on the simple cubic lattice as well as on face-centeredcubic , diamond or tetrahedral , and “210” lattices (see, e.g., refs. 77–79 for reviews).Although the importance of lattice models on the study of folding of small proteins has sincediminished—rightfully—as continuum coarse-grained and atomic models that offer higherstructural and energetic resolutions become increasingly tractable computationally, simplelattice modeling remains a powerful conceptual tool in the study of protein evolution because of these models’ ability to address large-scale sequence-structure relations in abiophysics-informed manner .Compared to these and many other efforts of using lattice models to study globularproteins, lattice modeling of IDP-like polyampholytes had not been extensive. Insofaras phase separations of charged polymers are concerned, grand canonical Monte Carlosimulations have been applied to model phase separations of relatively short chargedpolymers configured on simple cubic lattices . The systems considered include fullycharged polyelectrolytes of chain lengths N = 3, 4, 6, 8, 12, 16, and 24 ( N = number ofmonomers per chain) with neutralizing counterions in simulation boxes of sizes rangingfrom 16 × ×
16 to 24 × ×
24 (ref. 85) as well as fully charged polyampholytes withzero net charge of chain lengths N = 2, N = 4 (a diblock sequence), N = 8 (four differentsequences), and N = 16 (three different sequences) configured in boxes of sizes rangingfrom 14 × ×
14 to 24 × ×
24 (ref. 86). Because these chain lengths are much shorterthan those of IDPs involved in functional phase separations, in order to better connectlattice modeling to experimental IDP phase behavior, it would be desirable to conductsimilar studies for polyampholytes of longer chain lengths. However, unlike homopolymerlattice systems with only nearest-neighbor interactions , applying the grand canonicalMonte Carlo method to larger N ’s is technically problematic for polyampholytes, becauseof a sharply increasing rejection rate for attempted chain transfers with increasing N (ref. 85, 86). Here we take an alternate “brute-force” approach. As a case study, weapply direction simulations two fully charged N = 50 sequences with zero net charge butsignificantly different charge patterns to verify that the sequences with a more blockycharge pattern indeed phase separates much more readily than the strictly alternatingsequence with minimum blockiness. The differential effect we observe is significant; butit is also noteworthy that the phase separation tendency seen in our direct explicit-chainsimulations is less sensitive to charge pattern than that stipulated by RPA theory ,underscoring that quantitative predictions of the theory need to be treated with caution. Models and Methods
Model polymer chains are configured as self-avoiding walks on a simple cubic lattice(coordination number 6). Each polymeric bond connects two monomers that are nearestneighbors on the lattice. In other words, the length of the bond in our model is fixed asin most lattice protein models . Unlike the bond fluctuation model used in ref. 85,our model only allows bonds in the (0,0,1) direction and its five rotations and inversionsbut does not allow bonds in the (0,1,1) and (1,1,1) and their rotations and inversions. Thepresent simulations of the configurations of multiple polymer chains are conducted in cubicboxes with periodic boundary conditions.For any two different monomers labeled µ, i and ν, j ( µ, ν = 1 , , . . . , n label the polymerchains where n is the total number of chains in the simulation system, i, j = 1 , , . . . , N label the N monomers along each chain) with charges σ µi , σ νj , their electrostatic interactionis given by ( U el ) µi,νj = ( l B σ µi σ νj /r µi,νj ) exp( − r µi,νj /r S ), where r µi,νj is the spatial distancebetween the two monomers, l B = e / (4 π(cid:15) (cid:15) r k B T ) is Bjerrum length, e is elementary elec-tronic charge, (cid:15) is vacuum permittivity, (cid:15) r is relative permittivity (dielectric constant), k B is Boltzmann constant, and T is absolute temperature. The total potential energy is thusgiven by (cid:80) nµ =1 (cid:80) nν =1 (cid:80) Ni =1 (cid:80) Nj =1 (1 − δ µν δ ij )( U el ) µi,νj , where the Kronecker symbol δ ij = 1 for i = j , δ ij = 0 for i (cid:54) = j and the first factor in the summation serves to exclude self-interactionterms. Here we use (cid:15) r = 80, which corresponds to that of the water solvent, and a latticeconstant a = 3 . a is approximately the size of a water molecule; and r S isthe screening length in the model. For computational efficiency, we impose a 3 a cutoff forthe interaction, i.e., ( U el ) µi,νj = 0 for r µi,νj > a , and employ a temperature-independent r S = 10 ˚A. The model polymer bond length is equal to a in this construct. Results for E K E K E K E K E K E K E K E K E K E K E K E K E K E K E K E K E K E K E K E K E K E K E K E K E K (a)(b) K K E K K E K K K E K K E K K E E E K E K E K K E K K K K E K E K K E E E E E E E E K E E K K E E E
FIG. 1: Examples of systems simulated in this work for sequences sv1 (a) and sv15 (b). Positively(K) and negatively (E) charged residues (monomers) are depicted, respectively, as blue and redbeads. Snapshots of simulation boxes are taken at (a) T = 200 K and (b) 800 K. The snapshotsare representative of chain configurations below (left) and above (right) the respective system’scritical concentration φ cr . Each of these four boxes contains 300 polymer chains. The simulationboxes are of sizes 246 a × a × a (all box sizes are provided in units of a hereafter) (left,overall polymer volume fraction = 0 . × ×
52 (right, overall polymer volume fraction= 0 . / − monomers of 50merKE sequences (red for − ; blue for +) is identical to that in refs. 30 and 35. The statements in thecaptions for Figure 1 of ref. 30 and Figures 2 and 3 of ref. 35 that red is for + and blue is for − monomers were typographical errors that have no bearing on the results in refs. 30 and 35. other polymer bond lengths (with a rescaled r S ) may be obtained from the present data byrescaling the temperature. All simulated results reported here are for ensembles of n = 300identical polymer chains configured in simulation boxes of various sizes.The bulk of the present simulation effort is focused on two fully charged polyampholyteswith zero net charge ( σ i , σ j = ± (cid:80) Ni =1 σ i = 0, wherein the µ, ν indices can be dropped fromthe σ ’s because our simulated systems are restricted to ensembles of identical sequences).The sequences correspond to those labeled as sv1 and sv15 among the thirty KE sequencesfirst considered in ref. 31 (Fig. 1). Here K and E stand for lysine and glutamic acid, respec-tively (note that K also denotes degree Kelvin in contexts that should entail no confusion).The charge patterns of these two sequences are significantly different, as reflected by their κ parameters ( κ = 0 . κ = 0 . as well as their sequence chargedecoration (SCD) parameters , whereSCD ≡ N N (cid:88) i =1 N (cid:88) j = i +1 σ i σ j (cid:112) j − i , (1)with SCD= − .
41 for sv1 and SCD= − .
35 for sv15 . In view of the intensive compu-tation required to simulate the phase behaviors of these sequences, investigation of other N = 50 sequences with different KE patterns is left to future efforts.Monte Carlo simulations are used to sample chain configurations . The initial configu-ration of each of our simulation systems is prepared by randomly placing all the polymers asfully extended chains along the three Cartesian axes of the simulation box in equal numbers.The system is first equilibrated for 10 simulation steps, to be followed by a production runwith duration ranging from 2 × to 10 simulation steps. Each simulation step is anattempted move performed on a randomly chosen polymer chain and at a randomly chosenlocation along the chain. Move acceptance is based on the Metropolis criterion. Excludedvolume is enforced by disallowing any two chain monomers to occupy the same lattice site,i.e., attempted moves that would result in such disallowed configurations are rejected. Theattempted move is chosen stochastically among four types of moves with the following per-centage statistical weights: diagonal (kink jump, 40%), crankshaft (40%), pivot (includingend rotation, 10%), and reptation (10%). The first two types of moves are local motionsthat only involve a small number of monomers, whereas the latter two types of moves en-tail global motions that can potentially relocate a large number of monomers (up to order N ). The acceptance rates of these moves at the lowest simulation temperature (200 K) areapproximately 12%, 2%, 4%, and 0.25%, respectively; the rates are higher at higher tem-peratures. The function gsl rng.h and the random number generator mt19937 are usedfor the simulations.We perform two sets of extensive simulations for both sequences sv1 and sv15. Thefirst set of simulations is geared toward addressing the low-concentration side of thecoexistence phase boundary by observing whether a percolating polymer cluster developsin the simulation box (see below). These simulations are performed at 9 temperatures (200K, 300 K, . . . , 1000 K) and 19 different overall polymer volume fractions (defined as thenumber of monomers divided by the total number of lattice sites in the simulation box),namely 0.001, 0.002, . . . , 0.009, and 0.01, 0.02, . . . , 0.09, 0.1 by varying the simulationbox size from 52 × ×
52 to 246 × × Number of simulation steps/107 P o t e n ti a l e n e r gy / -8-6-4-20 T = 400 K T = 500 K T = 600 K T = 800 K FIG. 2: Evolution and gradual stabilization of potential energy (in units of k B T ) as a function ofnumber of simulation steps. Shown results are for 300 copies of sequence sv15 in a simulation boxof size 52 × ×
52 at four different temperatures. overall distribution of polymer density (see below). These simulations are performed at afixed overall polymer volume fraction of 0.1 (by using only 52 × ×
52 simulation boxes)for 17 different temperatures (200 K, 250K, 300 K, 350 K, . . . , 1000 K). We monitor theevolution of total potential energy as each simulation proceeds to ensure, to the extentpossible within our computational resources, that the system reaches a quasi-steady stateduring the production stage of our simulation by observing a near-leveling of the totalpotential energy. Nonetheless, at relatively lower temperatures, it is evident that thesystem is evolving very slowly—with the potential energy stabilizing very gradually—evenafter a large number of simulation steps (see examples in Fig. 2). The ramification of thisbehavior will be addressed below.
Results and Discussion
Temperature- and Concentration-Dependent Distributions of PolymerDensity.
We begin by examining our first set of simulations. The temperatures and theoverall polymer volume fractions ( φ values, referred to interchangeably as concentrationsbelow) of a subset of the simulated systems are represented by the grid points marked bycircles or squares in Fig. 3. For each system, we determine whether there exists a percolat-ing cluster connected by intrachain connectivity and interchain nearest-neighbor contactsthat encompasses ≥
80% of the polymer chains. Snapshots of systems with and withoutsuch a cluster are provided by the examples on the right and left, respectively, of Fig. 1.We arrived at the choice of using ≥
80% as an intuitive, putative criterion for phase separa-tion after monitoring a variety of clusters under different simulation conditions, and expect0 T FIG. 3: Estimating the dilute side of the phase boundary by the presence or absence of a largepolymer cluster. Each circle or square represents a (
T, φ )-system we simulated for sequences sv1and sv15 (separately) to ascertain whether a cluster consisting of ≥
80% of the polymer chainsis present in the simulation box. Such a cluster exists for sequences sv1 and sv15 in all (
T, φ )-systems marked, respectively, by the filled gray circles and filled black squares as well as systemswith larger φ values plotted to their right. Fitted continuous curves (solid and dashed curves forfilled black squares and gray circles respectively) are guides for the eye. T is in units of K. that reasonable variations of this criterion would produce similar results. The investigativeprotocol here is logically akin to the experiments that rely on observation/no observationof droplet formation by microscopy, an experimental technique that is commonly utilizedfor ascertaining conditions for IDP phase separation (see, e.g., Figure 3B of ref. 23). Theresult of our extensive exploratory study is shown in Fig. 3. It shows a clear sequenceeffect. At every temperature we simulated, the sequence with a more blocky charge pattern(sv15, squares in Fig. 3) begets such a cluster at a lower concentration φ than the strictlyalternating (non-blocky) sequence sv1 (gray circles in Fig. 3). It is quite remarkable that atsufficiently low temperature (200 K), a very low φ < .
01 is sufficient to induce formationof a cluster for sequence sv15 that encompasses ≥
80% of the chains.In addition to monitoring the formation of a large polymer cluster, phase behavior in thesimulation systems is addressed by characterizing fluctuations in polymer density. For eachsystem in the second set of simulations, we determine, as a function of position ( x, y, z )within the (large) simulation box, the number of sites within small cubic volumes (smallboxes) that are occupied by the polymer chains (Fig. 4). The ratio of this number withthe small cubic volume is the local polymer density (or, equivalently, local polymer con-1centration or local polymer volume fraction), denoted as φ ( x, y, z ) hereafter. Because ofperiodic boundary conditions, the total number of small boxes we used to sample localdensity is equal to the number of lattice sites in the simulation box. Most of the results onlocal polymer density presented below are obtained by using small boxes of size 8 × × × × . . . , 7 × × × × × × × × × × = 27 to 8 = 512 (whereas the total number ofsmall boxes is fixed) means that the support of the distribution is stretched by a factor of ≈
20. By itself, this consideration would argue that, relative to the distributions for thecase with small box size 3 × × ≈
20 should be appliedto the distributions plotted in Fig. 5 for the case with small box size 8 × × × × ≈
10 times higher than those for the 8 × × ≈
20 should be multiplied to the distributions for the 8 × × × × × × × × ≈ φ ( x ) as (cid:80) y,z φ ( x, y, z ) /L , where L is the linear dimension of thecubic simulation box. Thus φ ( x ) is an average local polymer density, or equivalently theaverage polymer density over a slab on the y – z plane with a thickness equal to the lineardimension of the sampling small box; φ ( y ) and φ ( z ) are defined analogously (Fig. 6a). Anoverall average distribution (cid:104) φ (cid:105) u for monitoring the spatial variation of polymer density is2 x yz FIG. 4: Sampling local polymer density. As described in the text, small boxes are placed atall positions within the simulation (large) box to determine local volume fraction of the modelpolymer chains. The snapshot shows 300 copies of sequence sv1 in a 115 × ×
115 simulationbox at T = 1000 K. The small boxes shown are of size 8 × × (a) (b) Polymer occupancy A v e r a g e nu m b e r o f s m a ll box e s FIG. 5: Distributions of local polymer density. Examples are shown for 300 copies of sequence sv1simulated (a) above T cr (at 600 K) and (b) below T cr (at 200 K) in a 52 × ×
52 simulation box.The curves in both (a) and (b) are for small boxes of sizes (from left to right) 3 × ×
3, 4 × × × ×
5, 6 × ×
6, 7 × ×
7, and 8 × ×
8. Polymer occupancy (horizontal variable) is thenumber of lattice sites, within a given small box, that are occupied by monomers of the simulatedchains. Polymer volume fraction is equal to polymer occupancy divided by the size of the smallbox. Each average number of small boxes with a given polymer occupancy is computed from 199snapshots. (a) u ) , < > u xyz u
10 20 30 40 50 u
200 K 250 K300 K350 K400 K450 K (b)
FIG. 6: Variation in local polymer density. Results are for 300 copies of sequence sv1 in a52 × ×
52 simulation box. Except specified otherwise, local polymer density is determined usingsmall boxes of size 8 × ×
8. The horizontal variable u stands for x , y or z (as in Fig. 4). (a)Vertical variable φ ( u ) is the average volume fraction as a function of x , y or z at T = 250 K(plotted by the thin colored curves as marked); (cid:104) φ (cid:105) u ≡ [ φ ( x ) + φ ( y ) + φ ( z )] / (cid:104) φ (cid:105) u computed by using small boxes of size 3 × × (cid:104) φ (cid:105) u of the same systemcomputed using 8 × × defined as [ φ ( x ) + φ ( y ) + φ ( z )] /
3, where x = y = z = u (Fig. 6). The example in Fig. 6aillustrates that, for the systems considered in our second set of simulations, the highestand lowest (cid:104) φ (cid:105) u values are not much affected by the size of small boxes for samplinglocal polymer density (the thick solid and dashed curves in Fig. 6a have very similarshapes). The average distribution (cid:104) φ (cid:105) u is bell-shaped at low temperatures (Fig. 6a),which is indicative of the presence of a cluster with locally elevated polymer density. Notunexpectedly, (cid:104) φ (cid:105) u is essentially flat at high temperatures when the polymers are spatiallymore evenly distributed (Fig. 6b). The trend is essentially identical for an alternate orderparameter z c for local density (Fig. 7). Defined as the number of nearest-neighbor contactsper monomer (“contacts” between sequential neighbors µ, i and µ, i + 1 along a chain notcounted), z c is analogous to the coordination number collective variable defined in ref. 90. Effects of Charge Pattern and Net Charge on Phase Separation.
We now pro-ceed to construct phase diagrams from information gleaned from the local polymer densitysimulations. Because the overall volume fraction φ = 0 . u
10 20 30 40 50 z c u
200 K 250 K300 K350 K400 K450 K
FIG. 7: Spatial variation of polymer density monitored by the average number of monomer-monomer contacts. For chains configured on the simple cubic lattice, 0 ≤ z c ≤ ≤ z c ≤ (cid:104) z c (cid:105) u ≡ [ (cid:104) z c (cid:105) x + (cid:104) z c (cid:105) y + (cid:104) z c (cid:105) z ] / (cid:104) z c (cid:105) x is the average over the monomers on the y – z plane at x , etc. Shownhere are (cid:104) z c (cid:105) u computed at different temperatures for the system in Fig. 6. high, polymer clusters always span the entire length of each of the three dimensions ofthe simulation box (see, e.g., snapshots on the right in Fig. 1). This feature allows us toestimate the coexisting volume fractions for systems that are clearly phase separated (e.g.,those at or below 350 K in Fig. 6b and Fig. 7) by identifying the condensed-phase volumefraction as max[ (cid:104) φ (cid:105) u ] and the dilute-phase volume fraction as min[ (cid:104) φ (cid:105) u ]. This procedureleads to the phase diagrams for sequences sv1 and sv15 in Fig. 8, wherein data points areplotted for temperatures with max[ (cid:104) φ (cid:105) u ] − min[ (cid:104) φ (cid:105) u ] > .
01. Although the accuracy of eachindividual phase diagram is limited by the finite sizes of the simulation systems and numer-ical uncertainties caused by extreme slow equilibration at low temperatures (see below), theresults in Fig. 8 are adequate for comparing the significantly different phase behaviors ofthe two sequences. Qualitatively consistent with expectation and theory, sequence sv15 hasa significantly higher tendency to phase separate than sv1. The critical temperature, T cr ,of sv15 is estimated to be approximately 1.9 times that of sv1. Quantitative comparisonof our simulation results with predictions from analytical theories is provided below underthe next subheading. It is instructive to note the differences between the phase boundariesestimated by local polymer density (data points and thick curves in Fig. 8) and the putativephase boundaries suggested by observation of a percolating cluster (Fig. 3 and thin curvesin Fig. 8), with the latter extending to temperatures above the estimated T cr ’s of the former.This finding implies that the existence of a percolating cluster is not sufficient, in general,for a clearly bimodal distribution of local polymer density. In other words, loosely connectedpolymer clusters can exist above T ∗ cr . For T < T cr , the difference between the two types5 T FIG. 8: Simulated polyampholyte phase diagrams exhibit significant dependence on charge pat-tern. Data points depicted by symbols are obtained from local polymer density analysis for se-quences sv1 (gray triangles) and sv15 (black diamonds) using 300 polymer chains in a 52 × × T cr ) are estimated by the peak temperaturesof the fitted curves. The thin dashed and solid curves are putative phase boundaries obtained inFig. 3 for sv1 and sv15 by the existence of a ≥
80% percolating polymer cluster. T is in units ofK. of estimated phase boundaries may be partly attributed to the fact that phase boundariesare not infinitely sharp , and that the width of the boundary region is expected to bepronounced for finite-size explicit-chain model simulation systems.As mentioned above, equilibration in our simulation systems for sv15 is extremely slowfor T ≤
400 K (Fig. 2): whereas potential energy essentially levels off toward the end ofthe simulation (at 3 × steps) for T = 800 K, it is still decreasing for T = 400 K, albeitvery gradually with a slope ≈ − . × − . This situation could be a mere consequence of abasic limitation of lattice chain models. For instance, lattice protein chain models of the HPvariety with attractive hydrophobic-like interactions are prone to be trapped kinetically ,leading to glassy dynamics and making it difficult to access their lowest-energy states viacommon Monte Carlo chain moves . Even G¯o models with structurally highly specific in-teractions encounter transient kinetic traps . Nonetheless, inasmuch as Monte Carlo chainmoves mimick physical Brownian motions , the slow dynamics suggested by some ofour model systems can be reflective of physical behaviors of real polyampholytes. SomeIDP condensates require energy input via ATP-dependent processes (not considered in ourmodel) to maintain an “active” liquid-like state . Some IDP condensates are known toundergo functional maturation or pathological fibrillization to condensed states withslower dynamics. If our model is seen as capturing some of the latter slow-dynamics behav-iors in a rudimentary manner, the sequence-dependent phase diagrams in Fig. 8 would be6relevant to experimental phase behaviors determined at a time scale comparable to that forthe onset of maturation, notwithstanding the observation that some of our low-temperaturemodel systems have not fully equilibrated. An obvious feature of lattice polymer modelsis their imposition of a spatial order that may otherwise be absent. For lattice modelsof globular proteins, it has been argued that this spatial order can play a structural rolesimilar to the hydrogen bonding network in the hydrophobic cores of folded structures .Disorder-to-order transitions to solid-like phases have been reported in previous simulationsof lattice polyampholytes . Whether such features of the model can be used to gain insightsinto IDP maturation and fribillization deserves further study.The main goal of our effort here is to explore general principles of sequence-dependentphase behaviors of polyampholytes and other heteropolymers and to use our simulationresults to assess analytical theories (see below). Quantitative comparisons with experimentis not our aim. Nonetheless, we should comment upon our consideration of a temperaturerange far exceeding—even just nominally—that of liquid aqueous solutions under atmo-spheric pressure (see, e.g., the T = 200 K to 1000 K range in Fig. 8). The high simulated T cr values (in K) are a consequence of the strong electrostatic interactions entailed by thetwo fully charged polyampholytes. Nonetheless, the same trend of sequence dependence isexpected to hold for a pair of sequences with similar charge patterns but lower charge densi-ties when, e.g., the charged monomers are interspersed among neutral monomers along thechain. Our results are relevant to those situations. For instance, because the interactionstrength scales as much as the square of charge density (or a somewhat lower exponentdepending on the sequence ), a polyampholyte with a similar chain length and similarcharge pattern of sv15 but with, e.g., ≈ / √ T cr to ≈ / ≈
900 K to ≈
300 K). As discussed above, a scalingdown of temperature can also ensue if we apply the current models to polyampholytes withmonomer-monomer bond lengths > . N = 50 sequences, termed h φ and h φ − , that are constructed for mod-eling, respectively, an all-hydrophobic sequence and a predominantly hydrophobic sequencewith four embedded negatively charged monomers (Fig. 9, top drawings). Hydrophobic in-teraction with short spatial range is modeled by a nearest-neighbor attractive contact energybetween any pair of hydrophobic monomers on the same chain or on different chains but notsequentially adjacent along a chain. The magnitude of this contact energy is equal to 1/3of the magnitude of pairwise electrostatic energy in our polyampholyte model at nearest-neighbor spatial separation. Unlike charged monomers, hydrophobic monomers have nointeraction (energy = 0) beyond nearest neighbor in our model. Hydrophobic interactionsare effective, solvent-mediated and thus they are temperature-dependent . However,7 T FIG. 9: Simulated phase diagrams for the h φ and h φ − sequences. The model sequences are shownat the top, wherein hydrophobic and negatively charged monomers are depicted, respectively, ingolden and red. h φ is all hydrophobic; h φ − contains four negative charges. Data points depictedby symbols in the plot are obtained from local polymer density analysis for h φ (black diamonds)and h φ − (gray triangles) using 300 polymer chains in a 52 × ×
52 simulation box. The solidand dashed curves are empirical fits, respectively, for the plotted h φ and h φ − data points. T is inunits of K. because our main interest here is the effect of sparsely distributed charges on phase behav-iors by comparing two sequences with the same hydrophobic background, not the effect ofhydrophobicity itself, we use a temperature-independent model for hydrophobic interactionsfor simplicity. The interaction among the charged monomers in sequence h φ − follows thatof the above polyampholyte model. We conduct simulation at 250 K, 300 K, . . . , 500 K forsequence h φ and 200 K, 250 K, 300 K, and 350 K for sequence h φ − and apply the localpolymer density method described above using small boxes of size 8 × ×
8. The phasediagrams we estimated from these simulations are provided in Fig. 9.Not surprisingly, the h φ − sequence with embedded negative charges (gray trianglesin Fig. 9) has a significantly lower propensity to phase separate (i.e., it has a lower T cr )than the all-hydrophobic h φ sequence (black diamonds in Fig. 9) because of the repulsiveelectrostatic interactions among the embedded negative charges in h φ − . This exampleis extremely simple, yet it illustrates how phosphorylations can be used to regulateIDP phase separation in the living cell. Phosphorylations add negative charges to anIDP and thus can modulate its conformational dimensions and its “polyelectrostatic”interactions with other biomolecules . Experiments on the 163-residue N-terminallow-complexity domain of the RNA-binding protein Fused in Sarcoma (FUS LC) showthat phosphorylations of this IDP disrupt its phase separation. A phosphomimetic variant8 FIG. 10: Comparing simulation results with RPA theories. Data points for the simulated phaseboundaries of sequences sv1 (gray triangles) and sv15 (black diamonds) from Fig. 8 are comparedwith coexistence phase boundaries predicted by three related RPA theories: salt-free RPA (solidcurves), RPA with monovalent salt ions, where salt volume fraction φ s = 0 . ≈ T cr than) that for sv1. All RPA-predicted phase boundaries shownhere are determined by using a monomer length scale a = 3 . (cid:15) r = 80(ref. 29). T , in units of K, is provided by a logarithmic scale to facilitate comparison of ratios of T cr ’s for the two sequences. of FUS LC, with twelve glutamic acid (E) substitutions for serine or threonine at positions7, 11, 19, 26, 30, 42, 61, 68, 84, 87, 117, and 131 (the variant is termed FUS LC 12E) alsodisrupts phase separation . With this in mind, our h φ and h φ − may be viewed as toymodels, respectively, of FUS LC and FUS LC 12E (the spacings of the four negative chargesat positions 2, 8, 24, and 40 of our h φ − are chosen to mimick the distribution of E residuesalong FUS LC 12E) for demonstrating the basic principles of how phosphorylations canfunctionally modulate IDP phase behaviors. Comparisons with Analytical Theories and Correlations with Charge PatternParameters.
We now compare our explicit-chain simulated phase diagrams for sequencessv1 and sv15 against RPA predictions for the two polyampholyte sequences (Fig. 10). Weconsider three different sequence-dependent RPA formulations: (i) Salt-free RPA, whichcorresponds to the ρ s = ρ c = 0 case in ref. 29, where ρ s and ρ c are the average numberdensities of salt and counterions, respectively. (ii) RPA with monovalent salt ions, with9volume fraction of positive or negative salt ion, φ s = ρ s a = 0 . ρ c = 0, ρ s (cid:54) = 0 case in ref. 29. Here the choice of φ s is equivalent to an NaClconcentration of approximately 200 mM such that the resulting electrostatic screening at300 K is approximately equal to that entailed by the temperature-independent screeninglength r S = 10 ˚A introduced in Models and Methods . (iii) RPA for a Coulomb potentialwith a short-range cutoff and temperature-independent screening, viz., U (S) ij ( r ) = l B σ i σ j r e − r/r S (1 − e − r/a ) , (2)where r is the spatial distance between charges σ i and σ j . The Fourier transform ( k -spaceexpression) of U (S) ij is given by (cid:16) ˆ U (S) k (cid:17) ij = l B σ i σ j (cid:18) πk + r − − πk + (1 /r S + 1 /a ) (cid:19) = 4 πl B σ i σ j (1 + 2 a/r S )( k + r − )[( ka ) + ( a/r S + 1) ] . (3)The RPA-predicted free energy and phase behaviors for this model are readily obtainedby replacing the i, j elements of the matrix ˆ U k in equation 35 of ref. 29 by the aboveexpression for ( ˆ U (S) k ) ij in Eq. 3. The coexistence phase boundaries predicted by these threeRPA formulations for sequences sv1 and sv15 are shown in Fig. 10. The predominant trendpredicted by all three formulations that T cr is higher for sv15 than for sv1 are qualitativelyconsistent with our simulations but there are significant quantitative mismatches betweentheory and simulation. Our simulated results in Fig. 8 indicate that the ratio of the twopolyampholytes’ critical temperatures T cr (sv15) /T cr (sv1) ≈ .
9, a value that would notbe affected by any potential temperature rescaling we entertained above. In contrast, thecorresponding T cr (sv15) /T cr (sv1) ratios predicted by the analytical theories are much larger:(i) 14 . . . φ cr ( φ cr ’s are the φ valuesat T cr ’s) are significantly lower than those estimated by our explicit-chain simulations.As a control, we check whether adding an overall FH term in the RPA formulation byintroducing a temperature-independent χ parameter in equation 10 of ref. 26 would resultin a smaller mismatch with lattice simulation. One possible rationale for adding a FH termis to account for excluded volume effect that may not have been fully taken care of by RPA.We find that adding a repulsive FH term (which might be identified with a stronger excludedvolume effect) leads to an even larger mismatch with the simulated T cr (sv15) /T cr (sv1) ofapproximately 1 .
9, whereas adding an attractive FH term (which might be caused by anoverall background Lennard-Jones-like attraction among the monomers) leads to a smallermismatch. But the shifts in the T cr (sv15) /T cr (sv1) are small unless | χ | is very large. For0 FIG. 11: RPA+FH theories. Phase boundaries for sequence sv1 (lower curves) and sequence sv15(upper curves) in salt-free RPA (solid black curves), RPA+FH with χ = − . χ = 0 . instance, whereas T cr (sv15) /T cr (sv1) for the original salt-free RPA is 14 .
3, the correspondingratio is 14 . χ = − .
5, and 13 . χ = 0 . particularly useful for the present analysis. These authors constructed bond fluctuationmodels on simple cubic lattices for fully charged N = 2, 4, 8, and 16 polyampholytes (allwith zero net charge) to simulate their phase behaviors under the full Coulomb potential(no screening). Here we compare their simulation results for the four N = 8 and three N = 16 sequences they considered against the salt-free RPA results we obtain for the samesequences (Table 1). We contrast the simulation-theory mismatches for their sequences withthose for our two N = 50 sequences sv1 and sv15 by using the reduced temperature T ∗ ≡ al B = (cid:18) π(cid:15) (cid:15) r ae (cid:19) T (4)to compare the critical temperatures of these models on the same footing. We also examinethe degree to which the phase behaviors of these polyampholytes are correlated with thecharge pattern parameters κ (ref. 31) and SCD (ref. 32).1 Table 1.
RPA-predicted critical parameters for the N = 8 and 16 sequences in ref. 86Sequence (ref. 86) a Charge pattern b T ∗ cr φ cr SCD κ P N KKKKEEEE 0.339 0.0354 − .
020 1.0000P N P N KKEEKKEE 0.125 0.0502 − .
816 0.1331PN PNP KEEEKEKK 0.135 0.0481 − .
849 0.5666P N PNP KKEEEKEK 0.117 0.0551 − .
723 0.5666P N KKKKKKKKEEEEEEEE 1.215 0.0242 − .
240 1.0000P N P N KKKKEEEEKKKKEEEE 0.409 0.0326 − .
802 0.0586PN P NPN P NPN KEEKKKEKEEEKKEKE 0.143 0.0480 − .
679 0.0233 a “P” and “N” denote, respectively, a positively and a negatively charged monomer. b Same sequence in the notation of refs. 30–32.The results of our analysis are summarized in Fig. 12. The explicit-chain simulated T ∗ cr and φ cr values for the four N = 8 and three N = 16 sequences that are included in Fig. 12a–c for comparison are taken from Table 1 of ref. 86. Fig. 12a shows that the RPA-predictedreduced critical temperatures ( T ∗ cr ’s) of polyampholytes of a given chain length N correlatevery well with the polyampholyte sequences’ SCD values, as we have first reported for aset of thirty N = 50 sequences , the fitted linear T ∗ cr –SCD relation of which is reproducedhere. Fig. 12a shows further that an approximate proportionality relationship T ∗ cr ∝ − SCDholds for RPA-predicted T ∗ cr values for a given N , with a proportionality constant thatincreases with N : The slope of the fitted T ∗ cr vs. − SCD line for N = 8 is 0 .
165 (blue dashedline), that for N = 16 is 0 .
231 (green dashed line), that for N = 50 is 0 .
314 (black dottedline), that for N = 320 is 0 .
384 (black dashed-dotted line). We also examine the twentytwo RPA-predicted T ∗ cr values for the N = 120, 240, and 320 sequences that we consideredpreviously as a whole (Fig. 12d) and find that their slopes are very similar (0 .
379 for N = 120, 0 .
394 for N = 240), suggesting that the slope may limit to ≈ .
40 as N → ∞ .Fig. 12a indicates that the explicit-chain simulated T ∗ cr ’s for the N = 8 and N = 16polyampholytes also correlate quite well with SCD, but the increase of their T ∗ cr with in-creasing value of − SCD is much more gradual than that predicted by RPA. For the caseof the three N = 16 sequences simulated by Cheong and Panagiotopoulos , it is quitelikely that the general approximate relationship is not linear but instead possesses a smallbut appreciable downward concavity. The degree of this simulation-theory mismatch in theslope d log( T ∗ cr ) /d ( − SCD) appears to increase with chain length: According to Table 1, theratio between the highest and lowest RPA-predicted T ∗ cr ’s is 0 . / .
117 = 2 .
90 for N = 8and 1 . / .
143 = 8 .
50 for N = 16. The corresponding ratios from the explicit-chain sim-ulation data of ref. 86 are, respectively, 0 . / .
240 = 1 .
50 and 0 . / .
278 = 3 .
15. Thismeans that RPA overestimates the T ∗ cr ratio by a factor of 2 . / . .
93 for N = 8 and2 -SCD c r * κ (a) (b) l o g ( c r ) * log( cr ) (c) c r * N = N = N = N = (d) -SCD FIG. 12: Correlation of simulated and theory-predicted polyampholyte phase behaviors withcharge pattern parameters. (a) Relationships between reduced critical temperature T ∗ cr and theSCD parameter of Sawle and Ghosh . Filled and open symbols represent, respectively, explicit-chain simulation data and predictions by RPA theory. Red squares: Simulation results for se-quences sv1 and sv15 from the present study and prediction by the RPA theory for Coulombpotential with temperature-independent screening (Eq.3; dashed-dotted curves in Fig. 10). Greendiamonds: Simulated results and predictions by salt-free RPA theory for three different N = 16sequences in ref. 86. Blue circles: Simulated results and predictions by salt-free RPA theory forfour different N = 8 sequences in ref. 86. The simulation results for the N = 8 and N = 16sequences are from Cheong and Panagiotopoulos . These symbols carry the same meanings in(b) and (c). The simulated T ∗ cr values for sequences sv1 and sv15 (filled red squares) are esti-mated from the empirically fitted phase boundaries in Fig. 8. Solid line segments drawn throughthe simulated data points and the dashed line through the two open red squares are guides forthe eye. Other dashed lines are least-squares fits to the RPA data points depicted in the samecolor. Included for comparison are the fitted T ∗ cr (for salt-free RPA) vs. − SCD relationship forthe thirty N = 50 sequences in ref. 30 (dotted line) and the eight N = 320 sequences in ref. 29(dashed-dotted line). (b) Dependence of T ∗ cr on the κ parameter of Das and Pappu . The threegolden stars represent theoretical predictions by Jiang et al. for the three N = 16 sequences inref. 85, the simulation and RPA-predicted results of which are depicted by green diamonds hereand in (a). Solid and dashed lines segments drawn through the simulated and RPA data points,respectively, are guides for the eye. (c) Relationship between critical temperature T ∗ cr and criticalpolymer volume fraction φ cr (log = log ). The simulated φ cr values for sequences sv1 and sv15(filled red squares) are estimated from the empirically fitted phase boundaries in Fig. 8. (d) T ∗ cr predicted by salt-free RPA vs. − SCD for the two N = 120 (red stars), twelve N = 240 (turquoiseplus signs), and eight N = 320 (orange crosses) sequences in ref. 29. Lines are least-squares fitsto the data points depicted in the same color. . / .
15 = 2 .
70 for N = 16. Viewed in this context, the corresponding overestimationof T cr (sv15) /T cr (sv1) by RPA with temperature-independent screening (the one physicallymost similar to our lattice model) by a factor of 9 . / . . N = 50 polyampholytes (see above) fits quite well into the trend gleanedfrom our observations for the N = 8 and N = 16 sequences in Fig. 12a. Nonetheless, futurestudies that compare the present results against those from other explicit-chain models willbe needed to ascertain the degree to which the trend seen here is sensitive to the pecularitiesof cubic lattice models.Compared to the good correlations between explicit-chain simulated T ∗ cr and − SCD inFig. 12a, the correlations between explicit-chain simulated T ∗ cr and κ in Fig. 12b are lessdefinitive. Three of the N = 8 sequences have similar T ∗ cr ’s but one sequence has a κ valuemuch lower than that of the other two (the latter have identical κ values, see filled blue cir-cles in Fig. 12b). In contrast, SCD is better at capturing the small variations of T ∗ cr amongthese three sequences (filled blue circles in Fig. 12a). For the three longer N = 16 sequences, κ captures the general trend of how T ∗ cr depends on sequence in that κ increases monoton-ically with the explicit-chain simulated T ∗ cr . However, the T ∗ cr – κ relationship (solid greenlines in Fig. 12b) deviates much more from linearity than the corresponding T ∗ cr –( − SCD)relationship (solid green lines in Fig. 12a), so much so that two of the N = 16 sequences withappreciably different T ∗ cr ’s are seen to have very similarly low κ values. The κ parameter isan extremely useful intuitive measure of the blockiness of charge distribution along chainsequences (see, e.g., ref. 107 for a recent application). Because κ relies on averagingcharges over windows of 5–6 residues, it is not entirely surprising that the effectiveness ofthis parameter would diminish for short sequences with lengths not much longer than thewindow for charge averaging. A clear strength of SCD (Eq. 1) is its ability to account forpatterning features that encompass charges that are far apart along the chain sequence.SCD emerges from a field-theoretic variational approach to account for sequence-dependentsingle-polyampholyte conformational dimensions by renormalized Kuhn lengths . In thistheory, the average end-to-end distance is a function of SCD; but the corresponding expres-sion for average radius of gyration involves additional charge-pattern terms (cf. equations 6and 13 of ref. 32). Whereas the extremely good correlations between SCD and RPA-predicted T ∗ cr might be partly attributed to the underlying Gaussian chain model sharedby both the variational and RPA theories (though the detailed mathematical relationshipremains to be explored), it is quite remarkable that the simple SCD parameters are alsowell correlated with explicit-chain simulated single-polyampholyte radii of gyration aswell as T ∗ cr ’s for multiple-chain phase separations (Fig. 12a). It would be very instructiveto elucidate what nuanced effects beyond an intuitive characterization of charge blockinessare captured by the SCD parameter.For completeness, we also include the T ∗ cr ’s for the three N = 16 sequences predicted by4Jiang et al.’s charged hard-sphere chain model theory which makes use of an analyticalhard-sphere equation of state (golden stars in Fig. 12b). These theoretically predicted T ∗ cr ’s correlate quite well with their explicit-chain simulated counterparts (filled green dia-monds in Fig. 12b), suggesting that this theory can be a useful general approach to studysequence-dependent phase behaviors.Fig. 12c compares the explicit-chain simulated and RPA-predicted critical volumefractions ( φ cr ). RPA significantly underestimates φ cr for all nine polyampholytes sequencesconsidered. The φ cr values estimated from explicit-chain simulations of the two N = 50sequences studied here (filled red squares) are comparable to the explicit-chain simulated φ cr values for the N = 8 and N = 16 sequences (filled green and blue symbols). As hasbeen commented upon , field-theoretic approaches to polyelectrolyte and polyampholytephase separations tend to significantly underestimate φ cr because the analytical treatmentsof density fluctuation is not sufficiently accurate. A case in point is that RPA predictsextremely low φ cr ’s for polyelectrolytes with neutralizing counterions . Even with animproved analytical treatment , the predicted φ cr ’s are still lower than that obtainedby explicit-chain simulations . We deem it likely that the underlying Gaussin chainmodel of RPA leads to underestimations of interchain attraction and thus a generalunderestimation of φ cr . By the same token, the underlying Gaussian chain model doesnot fully account for the possibilities that strong interchain attractions are achievable byspatially pairing up low- | SCD | sequences in certain specific configurations. This limitationprobably contributes to an underestimation of the propensities of low- | SCD | sequences tophase separate and thus a much sharper dependence of T ∗ cr on − SCD than that revealed byexplicit-chain simulations (Figs. 10 and 12a). This idea remains to be tested. It deservesfurther investigations that consider a set of sequences more extensive than that studiedhere. All in all, analytical theories are extremely useful conceptual tools and are valuablefor predicting and rationalizing trends in sequence-dependent phase separations thatare consistent with explicit-chain simulations (Fig. 12) and experiments . Nonetheless,addressing the aforementioned limitations would be necessary to improve their quantitativeaccuracy. Conclusions
To recapitulate, we have presented explicit-chain simulation data for n = 300 copies offully charge polyampholytes with N = 50 monomers configured on simple cubic lattices foran extensive set of temperatures and overall polyampholyte concentrations. A comparisonof the results for two specific polyampholyte sequences with significantly different chargepatterns indicates that the sequence with a more blocky charge pattern has a significantlyhigher tendency to phase separate. While this trend is anticipated and is consistent with5predictions by RPA theory, the variation of phase-separating tendency with charge patternis milder in the present lattice model than that predicted by RPA. Similar qualitativeagreements and quantitative mismatches between explicit-chain simulation and RPA areidentified for several shorter polyampholyte sequences that have been simulated previouslyfor their phase properties. Taken together, these findings lend credence to the utility ofRPA as an important tool for conceptual development and qualitative predictions. At thesame time, they underscore that caution should be exercised in quantitative interpretationof predictions from RPA and other analytical theories. Our analysis suggests that sequencecharge decoration (SCD) is an effective parameter for capturing the phase separatingtendency of polyampholytes with zero net charge, but the underlying physical reasons forits success remain to be better elucidated. We have also compared the phase behaviors ofan all-hydrophobic sequence and a largely hydrophobic sequence with sparsely embeddednegative charges as a toy model for exploring effects of phosphorylations on IDP phaseseparations and to provide a simple rationalization for pertinent experimental observations.Promising future extensions of the present effort include incorporation of structurally andenergetically more detailed representations of the interactions as well as construction ofcontinuum (off-lattice) models for sequence-dependent phase behaviors. Adaptation ofa recently developed simulation technique for Lennard-Jones homopolymers should beparticularly useful in this regard. Acknowledgments
We thank Lewis Kay and Heinrich Krobath for helpful discussions, and members andtrainees of the National Science Foundation (NSF)-funded Protein Folding and Dynam-ics Research Coordination Network for insightful inputs during the Network’s June-2017Annual Meeting at UC Berkeley where an earlier version of this work was presented by S.D.(NSF grant MCB 1516959). This work was supported by Canadian Cancer Society ResearchInstitute grant no. 703477, Canadian Institutes of Health Research grant MOP-84281, andcomputational resources provided by SciNet of Compute/Calcul Canada.6
References Maynard Smith, J. The concept of information in biology.
Phil. Sci. , , 177–194. Uversky, V. N.; Oldfield, C. J.; Dunker, A. K. Intrinsically disordered proteins in humandiseases: Introducing the D concept. Annu. Rev. Biophys. , , 215–246. Tompa, P. Intrinsically unstructured proteins: a 10-year recap.
Trends Biochem. Sci. , , 509–516. Wright, P. E.; Dyson, H. J. Intrinsically disordered proteins in cellular signalling and regula-tion.
Nat. Rev. Mol. Cell Biol. , , 18–29. Chen, J. Towards the physical basis of how intrinsically disorder mediates protein function.
Arch. Biochem. Biophys. , , 123–131. Chen, T.; Song, J.; Chan, H. S. Theoretical perspectives on nonnative interactions and intrinsicdisorder in protein folding and binding.
Curr. Opin. Struct. Biol. , , 32–42. Salonen, L. M.; Ellermann, M.; Diederich, F. Aromatic rings in chemical and biological recog-nition: Energetics and structures.
Angew. Chem. Int. Ed. , , 4808–4842. Brangwynne, C. P.; Mitchison, T. J.; Hyman, A. A. Active liquid-like behavior of nucleolidetermines their size and shape in Xenopus laevis oocytes.
Proc. Natl. Acad. Sci. U.S.A. , , 4334–4339. Hyman, A. A.; Weber, C. A.; J¨ulicher, F. Liquid-liquid phase separation in biology.
Annu.Rev. Cell Dev. Biol. , 39–58. Toretsky, J. A.; Wright, P. E. Assemblages: Functional units formed by cellular phase sepa-ration.
J. Cell Biol. , , 579–588. Wu, H.; Fuxreiter, M. The structure and dynamics of higher-order assemblies: Amyloids,signalosomes, and granules.
Cell , 1055–1066. Chong, P. A.; Forman-Kay, J. D. Liquid-liquid phase separation in cellular signaling systems.
Curr Opin Struct Biol , , 180–186. Mitrea, D. M.; Kriwacki, R. W. Phase separation in biology; functional organization of ahigher order.
Cell Commun. Signal. , , 1. Shin, Y.; Brangwynne, C. P. Liquid phase condensation in cell physiology and disease.
Science , , eaaf4382. Nott, T. J.; Petsalaki, E.; Farber, P.; Jervis, D.; Fussner, E.; Plochowietz, A.; Craggs, T. D.;Bazett-Jones, D. P.; Pawson, T.; Forman-Kay, J. D.; et al. Phase transition of a disorderednuage protein generates environmentally responsive membraneless organelles.
Mol. Cell , , 936–947. Feric, M.; Vaidya, N.; Harmon, T. S.; Mitrea, D. M.; Zhu, L.; Richardson, T. M.; Kriwacki,R. W.; Pappu, R. V.; Brangwynne, C. P. Coexisting liquid phases underlie nucleolar subcom-partments.
Cell , 1686–1697. Riback, J. A.; Katanski, C. D.; Kear-Scott, J. L.; Pilipenko, E. V.; Rojek, A. E.; SosnickT. R.; Drummond, D. A. Stress-triggered phase separation is an adaptive, evolutionarilytuned response.
Cell , , 1028–1040. Brady, J. P.; Farber, P. J.; Sekhar, A.; Lin, Y.-H.; Huang, R.; Bah, A.; Nott, T. J.; Chan, H.S.; Baldwin, A. J.; Forman-Kay, J, D.; et al. Structural and hydrodynamic properties of anintrinsically disordered region of a germ-cell specific protein on phase separation.
Proc. Natl.Acad. Sci. USA , , E8194–E8203. Kato, M.; Han, T. W.; Xie, S.; Shi, K.; Du, X.; Wu, L. C.; Mirzaei, H.; Goldsmith, E. J.;Longgood, J.; Pei, J.; et al. Cell-free formation of RNA granules: Low complexity sequencedomains form dynamic fibers within hydrogels.
Cell , , 753–767. Lin, Y.; Protter, D. S. W.; Rosen, M. K.; Parker, R. Formation and maturation of phase-separated droplets by RNA-binding proteins.
Mol. Cell , , 208–219. Jain, S.; Wheeler, J. R.; Walters, R. W.; Agrawal, A.; Barsic, A.; Parker, R. ATPase-modulated stress granules contain a diverse proteome and substructures.
Cell , ,487–498. Muiznieks, L. D.; Cirulis, J. T.; van der Horst, A.; Reinhardt, D. P.; Wuite, G. J. L.; Pom`es, R.;Keeley, F. W. Modulated growth, stability and interactions of liquid-like coacervate assembliesof elastin.
Matrix Biol. , , 39–50. Molliex, A.; Temirov, J.; Lee, J.; Coughlin, M.; Kanagaraj, A. P.; Kim, H. J.; Mittag, T.;Taylor, J. P. Phase separation by low complexity domains promotes stress granule assemblyand drives pathological fibrillization.
Cell , , 123–133. Quiroz, F. G.; Chilkoti, A. Sequence heuristics to encode phase behaviour in intrinsicallydisordered protein polymers.
Nat. Mater. , , 1164–1171. Pak, C. W.; Kosno, M.; Holehouse, A. S.; Padrick, S. B.; Mittal, A.; Ali, R.; Yunus, A. A.; Liu,D. R.; Pappu, R. V.; Rosen M. K. Sequence determinants of intracellular phase separation bycomplex coacervation of a disordered protein.
Mol. Cell , , 72–85. Lin, Y.-H.; Forman-Kay, J. D.; Chan, H. S. Sequence-specific polyampholyte phase separationin membraneless organelles.
Phys. Rev. Lett. , , 178101. Mahdi, K. A.; Olvera de la Cruz, M. Phase diagrams of salt-free polyelectrolyte semidilutesolutions.
Macromolecules , , 7649–7654. Ermoshkin, A. V.; Olvera de la Cruz, M. A modified random phase approximation of poly-electrolyte solutions.
Macromolecules , , 7824–7832. Lin, Y.-H.; Song, J.; Forman-Kay, J. D.; Chan, H. S. Random-phase-approximation theoryfor sequence-dependent, biologically functional liquid-liquid phase separation of intrinsically disordered proteins. J. Mol. Liq. , , 176–193. Lin, Y.-H.; Chan, H. S. Phase separation and single-chain compactness of charged disorderedproteins are strongly correlated.
Biophys, J. , , 2043–2046. Das, R. K.; Pappu, R. V. Conformations of intrinsically disordered proteins are influencedby linear sequence distribution of oppositely charged residues.
Proc. Natl. Acad. Sci. U.S.A. , , 13392–13397. Sawle, L.; Ghosh, K. A theoretical method to compute sequence dependent configurationalproperties in charged polymers and proteins.
J. Chem. Phys. , , 085101. Sawle, L.; Huihui, J.; Ghosh, K. All-atom simulations reveal protein charge decoration in thefolded and unfolded ensemble is key in thermophilic adaptation.
J. Chem. Theor. Comput. , , 5065–5075. Firman, T.; Ghosh, K. Sequence charge decoration dictates coil-globule transition in intrinsi-cally disordered proteins.
J. Chem. Phys. , , 123305. Lin, Y.-H.; Brady, J. P.; Forman-Kay, J. D.; Chan, H. S. Charge pattern matching as a ‘fuzzy’mode of molecular recognition for the functional phase separations of intrinsically disorderedproteins.
New J. Phys. , , 115003. Chang, L.-W.; Lytle, T. K.; Radhakrishna, M.; Madinya, J. J.; V´elez, J.; Sing, C. E.; Perry,S. L. Sequence and entropy-based control of complex coacervates.
Nat. Comm. , , 1273. Das, R. K.; Ruff, K. M.; Pappu, R. V. Relating sequence encoded information to form andfunction of intrinsically disordered proteins.
Curr. Opin. Struct. Biol. , , 102–112. Best, R. B. Computational and theoretical advances in studies of intrinsically disordered pro-teins.
Curr. Opin. Struct. Biol. , ,147–154. Levine, Z. A.; Shea, J.-E. Simulations of disordered proteins and systems with conformationalheterogeneity.
Curr. Opin. Struct. Biol. , , 95–103. Ruff, K. M.; Harmon, T. S.; Pappu, R. V. CAMELOT: A machine learning approach forcoarse-grained simulations of aggregation of block-copolymeric protein sequences.
J. Chem.Phys. , , 243123. Orr, W. J. C. Statistical treatment of polymer solutions at infinite dilution.
Trans. FaradaySoc. , , 12–27. Domb, C. Self avoiding walks on lattices.
Adv. Chem. Phys. , , 229–259. de Gennes, P.-G. Scaling Concepts in Polymer Physics ; Cornell University Press, Ithaca,U.S.A.; 1979; pp 39–43. Larson, R. G.; Scriven, L. E.; Davis, H. T. Monte Carlo simulation of model amphiphile-oil-water systems.
J. Chem. Phys. , , 2411–2420. Sumners, D. W.; Whittington, S. G. Knots in self-avoiding walks.
J. Phys. A.-Math. Gen. , , 1689–1694. Chen, S.-J.; Dill, K. A. Statistical thermodynamics of double-stranded polymer molecules. J. Chem. Phys. , , 5802–5813. Cao, S.; Chen, S.-J. Predicting RNA pseudoknot folding thermodynamics.
Nucl. Acids Res. , , 2634–2652. Liu, Z.; Mann, J. K.; Zechiedrich, E. L.; Chan, H. S. Topological information embodied inlocal juxtaposition geometry provides a statistical mechanical basis for unknotting by type-2DNA topoisomerases.
J. Mol. Biol. , , 268–285. Taketomi, H.; Ueda, Y.; G¯o, N. Studies on protein folding, unfolding and fluctuations bycomputer simulation. I. The effect of specific amino acid sequence represented by specificinter-unit interactions.
Int. J. Pept. Protein Res. , , 445–459. Taketomi, H.; Kanˆo, F.; G¯o, N. The effect of amino acid substitution on protein-folding and-unfolding transition studied by computer simulation.
Biopolymers , , 527–559. G¯o, N.; Taketomi, H. Respective roles of short- and long-range interactions in protein folding.
Proc. Natl. Acad. Sci. U.S.A. , , 559–563. Chan, H. S. Protein folding: Matching speed and locality.
Nature , 761–763. G¯o, N. Theoretical studies of protein folding.
Annu. Rev. Biophys. Bioeng. , , 183–210. Go, N. The Consistency Principle Revisited. In
Old and New Views of Protein Folding ; Kuwa-jima, K., Arai, M., Eds.; Elsevier, Amsterdam, The Netherlands, 1999; pp 97–105. Bryngelson, J. D.; Wolynes, P. G. Spin glasses and the statistical mechanics of protein folding.
Proc. Natl. Acad. Sci. U.S.A. , , 7524–7528. Chan, H. S.; Zhang, Z.; Wallin, S.; Liu, Z. Cooperativity, local-nonlocal coupling, and nonna-tive interactions: Principles of protein folding from coarse-grained models.
Annu. Rev. Phys.Chem. , , 301–326. Lau, K. F.; Dill, K. A. A lattice statistical mechanics model of the conformational and sequencespaces of proteins.
Macromolecules , , 3986–3997. Dill, K. A. Theory for the folding and stability of globular proteins.
Biochemistry , ,1501–1509. Lau, K. F.; Dill, K. A. Theory for protein mutability and biogenesis.
Proc. Natl. Acad. Sci.U.S.A. , , 638–642. O’Toole, E. M.; Panagiotopoulos, A. Z. Monte Carlo simulation of folding transitions of simplemodel proteins using a chain growth algorithm.
J. Chem. Phys. , , 8644–8652. Lipman, D. J.; Wilbur, W. J. Modelling neutral and selective evolution of protein folding.
Proc. R. Soc. Lond. B , , 7–11. Chan, H. S.; Bornberg-Bauer, E. Perspectives on protein evolution from simple exact models.
Appl. Bioinform. , , 121–144. Chan, H. S.; Dill, K. A. Intrachain loops in polymers: Effects of excluded volume.
J. Chem.Phys. , , 492–509. Chan, H. S.; Dill, K. A. The effects of internal constraints on the configurations of chain molecules. J. Chem. Phys. , , 3118–3135; Erratum: J. Chem. Phys. , , 10353. Chan, H. S.; Dill, K. A. Compact polymers.
Macromolecules , , 4559–4573. Camacho, C. J.; Thirumalai, D. Minimum energy compact structures of random sequences ofheteropolymers.
Phys. Rev. Lett. , , 2505–2508. Shakhnovich, E.; Farztdinov, G.; Gutin, A. M.; Karplus, M. Protein folding bottlenecks: Alattice Monte Carlo simulation.
Phys. Rev. Lett. , , 1665–1668. Leopold, P. E.; Montal, M.; Onuchic, J. N. Protein folding funnels: A kinetic approach to thesequence-structure relationship.
Proc. Natl. Acad. Sci. U.S.A. , , 8721–8725. Wolynes, P. G.; Onuchic, J. N.; Thirumalai, D. Navigating the folding routes.
Science , , 1619–1620. Chan, H. S.; Shimizu, S.; Kaya, H. Cooperativity principles in protein folding.
Methods Enzy-mol. , , 350–379. Li, H.; Helling, R.; Tang, C.; Wingreen, N. Emergence of preferred structures in a simplemodel of protein folding.
Science , , 666–669. Shakhnovich, E. I. Proteins with selected sequences fold into unique native conformation.
Phys. Rev. Lett. , , 3907–3910. Covell, D. G.; Jernigan, R. L. Conformations of folded proteins in restricted spaces.
Biochem-istry , , 3287–3294. Kolinski, A.; Skolnick, J.; Yaris, R. The collapse transition of semiflexible polymers. A MonteCarlo simulation of a model system.
J. Chem. Phys. , , 3585–3597. Hinds, D. A.; Levitt, M. A lattice model for protein structure prediction at low resolution.
Proc. Natl. Acad. Sci. U.S.A. , , 2536–2540. Skolnick, J.; Kolinski, A. Simulations of the folding of a globular protein.
Science , ,1121–1125. Dill, K. A.; Bromberg, S.; Yue, K.; Fiebig, K. M.; Yee, D. P.; Thomas, P. D.; Chan, H. S.Principles of protein folding — A perspective from simple exact models.
Protein Sci. , ,561–602. Bryngelson, J. D.; Onuchic, J. N.; Socci, N. D.; Wolynes, P. G. Funnels, pathways, and theenergy landscape of protein folding: A synthesis.
Proteins , , 167–195. Chan, H. S.; Kaya, H.; Shimizu, S. Computational Methods for Protein Folding: Scaling aHierarchy of Complexities. In
Current Topics in Computational Molecular Biology ; Jiang, T.,Xu Y., Zhang, M. Q., Eds.; The MIT Press: Cambridge, Massachusetts, U.S.A., 2002; Chapter16, pp 403–447. Guseva, E.; Zuckermann, R. N.; Dill, K. A. Foldamer hypothesis for the growth and sequencedifferentiation of prebiotic polymers.
Proc. Natl. Acad. Sci. U.S.A. , , E7460–E7468. Moreno-Hern´andez, S.; Levitt, M. Comparative modeling and protein-like features ofhydrophobic-polar models on a two-dimensional lattice.
Proteins , , 1683–1693. Sikosek, T.; Chan, H. S. Biophysics of protein evolution and evolutionary protein biophysics.
J. R. Soc. Interface , , 20140419, Higgs, P. G.; Joanny, J. F. Theory of polyampholyte solutions.
J. Chem. Phys. ,1543–1554. Wittmer, J.; Johner, A.; Joanny, J. F. Random and alternating polyampholytes.
EPL , , 263–268. Orkoulas, G.; Kumar, S. K.; Panagiotopoulos, A. Z. Monte Carlo study of Coulombic criticalityin polyelectrolytes.
Phys. Rev. Lett. , , 048303. Cheong, D. W.; Panagiotopoulos, A. Z. Phase behaviour of polyampholyte chains from grandcanonical Monte Carlo simulations.
Mol. Phys. , , 3031–3044. Panagiotopoulos, A. Z.; Wong, V.; Floriano, M. A. Phase equilibria of lattice polymers fromhistogram reweighting Monte Carlo simulations.
Macromolecules , , 912–918. Carmesin, I.; Kremer, K. The bond fluctuation method: a new effective algorithm for thedynamics of polymers in all spatial dimensions.
Macromolecules , , 2819–2823. Baschnagel, J.; Wittmer, J. P.; Meyer, H. Monte Carlo simulation of polymers: Coarse-grained models. In
Computational Soft Matter: From Synthetic Polymers to Proteins ; Attig,N.; Binder, K.; Grubm¨uller, H.; Kremer K.. Eds.; John von Neumann Institute for Computing,J¨ulich, Germany, 2004; pp 83–140. White, A. D.; Voth, G. A. Efficient and minimal method to bias molecular simulations withexperimental data.
J. Chem. Theor. Comput. , , 3023–3030. Wang, Z.-G. Concentration fluctuation in binary polymer blends: χ parameter, spinodal andGinzburg criterion. J. Chem. Phys. , , 481–500. Wang, R.; Wang, Z.-G. Theory of polymer chains in poor solvents: Single-chain structure,solution thermodynamics, and Θ point.
Macromolecules , , 4094–4102. Chan, H. S.; Dill, K. A. Transition states and folding dynamics of proteins and heteropolymers.
J. Chem. Phys. , , 9238–9257. Yue, K.; Fiebig, K. M.; Thomas, P. D.; Chan, H. S.; Shakhnovich, E. I.; Dill, K. A. A test oflattice protein folding algorithms.
Proc. Natl. Acad. Sci. U.S.A. , , 325–329. Kaya H.; Chan, H. S. Origins of chevron rollovers in non-two-state protein folding kinetics.
Phys. Rev. Lett. , , 258104. Rey, A.; Skolnick, J. Comparison of lattice Monte Carlo dynamics and Brownian dynamicsfolding pathways of α -helical hairpins. Chem. Phys. , , 199–219. Socci, N. D.; Bialek, W. S.; Onuchic, J. N. Properties and origins of protein secondary struc-ture.
Phys. Rev. E , , 3440–3443. Hunt, N. G.; Gregoret, L. M.; Cohen, F. E. The origins of protein secondary structure: Effectsof packing density and hydrogen bonding studied by a fast conformational search.
J. Mol.Biol. , , 312–326. Yee, D. P.; Chan, H. S.; Havel, T. F.; Dill, K. A. Does compactness induce secondary structurein proteins? A study of poly-alanine chains computed by distance geometry.
J. Mol. Biol. , , 557–573. Dill, K. A.; Alonso, D. O. V.; Hutchinson, K. Thermal stabilities of globular proteins.
Bio-chemistry , 5439–5449. Shimizu, S.; Chan, H. S. Configuration-dependent heat capacity of pairwise hydrophobic in-teractions.
J. Am. Chem. Soc. , , 2083–2084. Borg, M.; Mittag, T.; Pawson, T.; Tyers, M.; Forman-Kay, J. D.; Chan, H. S. Polyelectrostaticinteractions of disordered ligands suggest a physical basis for ultrasensitivity.
Proc. Natl. Acad.Sci. U.S.A. , , 9650–9655. Csizmok, V.; Orlicky, S.; Cheng, J.; Song, J.; Bah, A.; Delgoshaie, N.; Lin, H.; Mittag, T.;Sicheri, F.; Chan, H. S.; et al. An allosteric conduit facilitates dynamic multisite substraterecognition by the SCF
Cdc4 ubiquitin ligase.
Nat. Comm. , , 13943. Kwon, I.; Kato, M.; Xiang, S.; Wu, L.; Theodoropoulos, P.; Mirzaei, H.; Han, T.; Xie, S.;Corden, J. L.; McKnight, S. L. Phosphorylation-regulated binding of RNA polymerase II tofibrous polymers of low-complexity domains.
Cell , , 1049–1060. Monahan, Z.; Ryan, V. H.; Janke, A. M.; Burke, K. A.; Rhoads, S. N.; Zerze, G. H.; O’Meally,R.; Dignon, G. L.; Conicella, A. E.; Zheng, W.; et al. Phosphorylation of the FUS low-complexity domain disrupts phase separation, aggregation, and toxicity.
EMBO J. , ,2951–2967. Holehouse, A. S.; Das, R. K.; Ahad, J. N.; Richardson, M. O. G.; Pappu, R. V. CIDER:Resources to analyze sequence-ensemble relationships of intrinsically disordered proteins.
Bio-phys. J. , , 16–21. Sherry, K. P.; Das, R. K.; Pappu, R. V.; Barrick, D. Control of transcriptional activity bydesign of charge patterning in the intrinsically disordered RAM region of the Notch receptor.
Proc. Natl. Acad. Sci. U.S.A. , , E9243–E9252. Jiang, J.; Feng, J.; Liu, H.; Hu, Y. Phase behavior of polyampholytes from charged hard-spherechain model.
J. Chem. Phys. , , 144908. Boubl´ık, T. Hard-sphere equation of state.
J. Chem. Phys. , , 471–472. Muthukumar, M. Phase diagram of polyelectrolyte solutions: Weak polymer effect.
Macro-molecules , , 9142–9145. Silmore, K. S.; Howard, M. P.; Panagiotopoulos, A. Z. Vapour-liquid phase equilibrium andsurface tension of fully flexible Lennard-Jones chains.
Mol. Phys. , , 320–327. T2004006008001000