[PDF] The Spanning Tree Model for the Assembly Kinetics of RNA Viruses

Abstract

We present a simple kinetic model for the assembly of small single-stranded RNA viruses that can be used to carry out analytical packaging contests between different types of RNA molecules. The RNA selection mechanism is purely kinetic and based on small differences between the assembly energy profiles. RNA molecules that win these packaging contests are characterized by having a minimum "Maximum Ladder Distance" and a maximum "Wrapping Number".The former is a topological invariant that measures the "branchiness" of the genome molecule while the latter measures the ability of the genome molecule to maximally associate with the capsid proteins. The model can also be used study the applicability of the theory of nucleation and growth to viral assembly, which breaks down with increasing strength of the RNA-protein interaction.

Full PDF

TThe Spanning Tree Model for the Assembly Kinetics of RNA Viruses.

Inbal Mizrahi, Robijn Bruinsma,

1, 2 and Joseph Rudnick Department of Physics and Astronomy, University of California, Los Angeles, CA 90095 Department of Chemistry and Biochemistry, University of California, Los Angeles, CA 90095 (Dated: February 9, 2021)We present a simple kinetic model for the assembly of small single-stranded RNA viruses that canbe used to carry out analytical packaging contests between diﬀerent types of RNA molecules. TheRNA selection mechanism is purely kinetic and based on small diﬀerences between the assemblyenergy proﬁles. RNA molecules that win these packaging contests are characterized by having aminimum

Maximum Ladder Distance and a maximum

Wrapping Number . The former is a topo-logical invariant that measures the “branchiness” of the genome molecule while the latter measuresthe ability of the genome molecule to maximally associate with the capsid proteins. The model canalso be used study the applicability of the theory of nucleation and growth to viral assembly, whichbreaks down with increasing strength of the RNA-protein interaction.

I. INTRODUCTION

Most viruses have a rod-like or sphere-like shape. Crickand Watson [1] noted that the volume enclosed by theprotein shell (or “capsid”) of a sphere-like virus deter-mines the length of close-packed viral genome molecules.In a similar way, the length of a rod-like capsid scales withthe length of the enclosed genome. They exploited thegeometrical connection between capsid and genome to re-late the size of a virus to the number of encoded genes,and predicted that spherical viruses should have icosa-hedral symmetry. This approach works well for manysmaller single-stranded (ss) RNA viruses (such as the po-lio and common cold viruses) as well as for some largerdouble-stranded DNA viruses (such as the herpes andbacteriophage viruses) [2]. The genome molecules of thevery large retroviruses, like HIV-1, are not close-packedbut the dimensions of the capsid still scale with that ofthe enclosed genome because the genome molecules ad-here to the inner surface area of the capsid so the surfacearea is proportional to the total length of the genomemolecules [3].Are there other relations between the geometry ofthat capsid and that of the genome molecules? The co-assembly of small single-stranded RNA viruses involvesboth the capsid proteins and the viral RNA molecules.Such viruses often will self-assemble spontaneously in so-lutions that contain higher concentrations of viral capsidproteins and RNA molecules [4, 5]. The process is passiveand driven by free energy minimization. Early work byAaron Klug [6] indicated that the RNA genome moleculesact as templates that direct the viral assembly process.He proposed a physical model in which repulsive electro-static interactions between capsid proteins are just strongenough to prevent spontaneous capsid assembly driven byhydrophobic attractive interactions between capsid pro-teins but if viral RNA molecules are present as well thenthe negative charges of RNA molecules can neutralizesome of the positive capsid protein charges and therebytilt the free energy balance towards assembly [7].The genome molecules of ss RNA viruses genome molecules have a tree-like “secondary structure” pro-duced by Watson-Crick base-pairing between comple-mentary RNA nucleotides of the primary sequence ofRNA nucleotides [8]. The co-assembly picture suggeststhat there could be connections between the geomet-ric and topological features of the viral RNA secondarystructure directing assembly and the capsid geometry.The redundancy of the genetic code indeed allows for thepossibility of “silent” (or synonymous) mutations thatcan alter the secondary structure of an ss RNA moleculewithout altering the structure of the proteins encoded inthe nucleotide sequence [9]. A systematic comparison be-tween the secondary structures of viral RNA moleculesand randomized versions of the same molecules indeedrevealed that they are signiﬁcantly more branched andcompact than generic RNA molecules [10].The assembly of empty viral capsids is similar to thatof micelles and other amphipathic molecular systems thatself-assemble spontaneously [11]. Empty capsid assemblymay be viewed as a chemical reaction where a certainnumber of capsid proteins react to form the capsid. Ap-plication of the Law of Mass Action (LMA) for reactionsin chemical equilibrium produces a relation between theconcentrations of assembled capsids and of free capsidproteins with the total protein concentration. This rela-tionship has been conﬁrmed experimentally [12] (but seeref.[13]). An important feature of the LMA is the pres-ence of a critical aggregation concentration (or CAC).For the case of empty capsid assembly, this means thatcapsids only can form if the total capsid protein concen-tration exceeds this CAC. Another important feature isthat assembled capsids should spontaneously disassem-ble if the concentration of capsid proteins in the sur-rounding solution is reduced to zero. This does not oc-cur in actuality—at least not on laboratory time scales—presumably because of the presence of an activation en-ergy barrier inhibiting disassembly that is large comparedto the thermal energy k B T .Kinetic studies of empty capsid assembly [14–16] re-port that there is a delay or lag time before capsid as-sembly can take place in a solution of capsid proteinsthat has been primed for assembly. Assembly starts with a r X i v : . [ phy s i c s . b i o - ph ] F e b the formation of a “nucleation complex” composed of asmall number of capsid proteins. The nucleation complexthen extends or elongates by absorbing more proteins un-til the capsid closes up. The theory of nucleation andgrowth [17] has become an important tool for the inter-pretation of kinetic studies, with the nucleation complexcorresponding to the critical nucleus. The small size ofthe critical nucleus means that in-vitro assembly assaysare taking place under conditions of a signiﬁcant level of supersaturation and, based on kinetic considerations, thesame is probably true under in-vivo conditions. A studyof the assembly kinetics of empty capsids of the Hepati-tis B virus indicates that the assembly of such capsids isgoverned by distinct assembly pathways [18], much likedistinct assembly pathways govern the folding of proteins[19]. While all-atoms simulations of capsids are possible[20], coarse-grained models have been found to be usefulto interpret equilibrium and kinetic properties of emptycapsid assembly[13, 21–29].The co-assembly of capsid proteins with ss RNAmolecules involves additional thermodynamic parame-ters. One of these is the mixing ratio of the concentrationof RNA molecules to that of the capsid proteins. An in-vitro study [30] of the co-assembly of CCMV (CowpeaChlorotic Mottle Virus) with viral RNA molecules re-ported that when the RNA-to-protein mixing ratio is lowthen virus-like particles co-exist with excess proteins. Byanalogy with formation reactions of a binary compound,one would expect that in the case of large mixing ratiosthere is co-existence of virus-like particles with excessRNA molecules. The border-line mixing ratio separat-ing these two regimes should correspond to the RNA-to-protein ratio of an assembled virus-like particle, whichcan be thought of as a stoichiometric ratio . In actuality,when the mixing ratio exceeded a certain threshold thena distribution of disordered RNA-protein aggregates wasobserved [30]. Moreover, the border-line mixing ratiowas not the stoichiometric ratio. A second thermody-namic parameter found to be important for co-assemblyis the ratio of the RNA to protein aﬃnity with the pro-tein to protein aﬃnity. This aﬃnity ratio can be alteredexperimentally by changing the acidity and salinity of thesolution [31]. It was found that for increased values ofthis ratio, virus-like particles were replaced by disorderedRNA-protein aggregates.Information about co-assembly also has been gleanedfrom structural studies. Until recently, reconstruction ofpackaged genome molecules involved “icosahedral aver-aging”, which resulted in RNA structures with imposedicosahedral symmetry [32]. Such studies showed thatthe interior surface of the icosahedral capsids of certainviruses (e.g., the nodaviruses [33]) is decorated by pairedRNA strands lining the edges of the “capsomers” (i.e.,pentameric or hexameric groupings of capsid proteins)).Recent progress in cryo-electron tomography has made itpossible to image individual ss RNA genome moleculespackaged inside spherical capsids without icosahedral av-eraging (“asymmetric reconstruction”[34, 35]). One ex- ample is the MS2 virus, which is an ss RNA, quasi-icosahedral bacteriophage virus. It was found that asubsection of the RNA genome reproducibly associatedwith a compact cluster of capsid proteins [36]. This re-sult was interpreted as evidence for a well-deﬁned as-sembly pathway where the energetically ”uphill” part ofnucleation-and-growth scenario produces a compact nu-cleation complex held together by particular sections ofthe viral RNA molecule with enhanced aﬃnity for thecapsid proteins (such sections are known as packagingsignals [37]). In this picture, the “downhill” part ofthe assembly process involves more generic electrostaticRNA-protein attractions. According to this model, thenucleation complex plays the important role of selectingthe viral RNA molecules from quite similar host messen-ger RNA molecules.In this paper we propose a statistical mechanical modelthat can be used to explore the question of the inﬂu-ence of geometric and topological features of tree-shapedmolecules on the promotion of packaging by capsid pro-teins and how these features are connected to kineticallyfavored assembly pathways. It allows testing of the the-ory of nucleation and growth. Other questions that canbe addressed are why the “chemical reaction picture”works well for empty capsid assembly but not so well forco-assembly, whether packaging selectivity is consistentwith a high level of supersaturation.The proposed model, the “spanning tree model” model,extends a simple model for the assembly of empty dodec-ahedral capsids from pentamers in solution, due to Zlot-nick [21, 38], by allowing it to package branched genomemolecules that decorate the edges of the dodecahedralcapsid. An important advantage of the Zlotnick Modelis that its assembly kinetics has been studied [13, 25]while the packaging kinetics of linear genome moleculeshas been numerically simulated [39]. Two diﬀerent pack-aging scenarios were encountered. The ﬁrst scenario issimilar to that of the assembly of empty capsids with as-sembly intermediates in the form of compact pentamerclusters that grow in a pentamer-by-pentamer fashion. Inthe second “en masse” scenario, the ﬁrst assembly step isthe formation of a disordered pentamer/genome conden-sate formed which then undergoes an ordering transitionthat can be described by the Landau theory of symmetry-breaking on a spherical surface (see [40] and referencestherein).

II. THE SPANNING TREE MODEL.

We start with a brief review of the Zlotnick Model [21].

A. Zlotnick Model

In the Zlotnick Model, a capsid is modeled as a do-decahedral shell composed of twelve pentamers. As-sembly takes place in a reservoir of pentamers and isdriven by an attractive edge-edge interaction betweenpentamers. A minimum-energy assembly pathway isdeﬁned as a pentamer-by-pentamer addition sequencewhere each added pentamer is placed in a location ona partially assembled dodecahedral shell that minimizesthe total energy. An example of a minimum-energy as-sembly pathway is shown in Fig. 1.

FIG. 1. Zlotnick Model. The ﬁgure shows a minimum-energypathway for the assembly of a dodecahedral shell composed oftwelve pentamers with adhesive edges. The edge-edge bindingenergy is (cid:15) . The change in energy per added pentamer isindicated.

The energy E ( n ) of a cluster of n pentamers is de-ﬁned as the number of shared pentamer edges times thebinding energy (cid:15) per edge minus a constant µ times thenumber of pentamers. Here, µ is the chemical potentialof a pentamer in solution at a certain reference pentamerconcentration ¯ c [41] in units of k B T . If the concentrationof free pentamers c f diﬀers from ¯ c then µ must be re-placed by µ = µ + ln c f / ¯ c but in this section c f equals ¯ c .From here on, concentrations are expressed in terms of ¯ c and thus dimensionless. Since a dodecahedron has thirtyedges, the assembly energy of a complete capsid equals30 (cid:15) − µ . If the chemical potential equals (5 / (cid:15) thenthe assembly energy of a pentamer that is part of a cap-sid is the same as that of a free pentamer in solution (i.e.,zero). We will refer to µ ∗ = (5 / (cid:15) as the chemical poten-tial for assembly equilibrium. Below, we will use energyunits in which (cid:15) = − µ ∗ equal to -(5/2). Animportant feature of the Zlotnick model is that the en-ergy gain for the initiation of assembly is relatively low,namely one factor of (cid:15) , while the cost of removing a sin-gle pentamer from an assembled capsid is relatively high:it requires breaking ﬁve bonds with a total energy costof minus 5 (cid:15) . This allows for metastability of assembledcapsids in solutions with low pentamer concentrations.Figure 2 (top) shows the minimum-energy assemblypathway corresponding to Fig. 1 for three diﬀerent val-ues of µ close to µ ∗ and for (cid:15) = −

1. There is a largenumber of such minimum energy assembly pathways forthe Zlotnick model (of the order of 10 ) with the energy FIG. 2. Top: Assembly energy proﬁle E ( n ) for the mini-mum assembly energy pathway of the Zlotnick Model shownin Fig. 1. Blue dots: µ is slightly below the chemical po-tential − (5 /

2) for assembly equilibrium. Orange squares: µ is equal to − (5 / µ is slightly above − (5 / µ < µ ∗ , the absolute energy minimumis at n = 0 while for µ > µ ∗ the absolute minimum is at n = 12, the assembled capsid. The height of the maxima cor-responds to the activation energy barrier. The location n ∗ ofthe maximum is around n = 6 in the assembly equilibriumstate and shifts to smaller values as µ increases. proﬁle of Fig. 2 (top). Figure 2(bottom) shows how theseenergy proﬁles can be interpreted in terms of the theoryof nucleation and growth [17]. The height of the energymaximum plays the role of the energy activation barrier.The size of the critical nucleus is the value n ∗ for which E ( n ∗ ) has a maximum. If µ (cid:39) µ ∗ , then n ∗ is around 6while n ∗ decreases with increasing levels of pentamer su-persaturation. Note that the Zlotnick Model eﬀectivelyincorporates the surface or line tension that plays a keyrole in the theory of nucleation and growth.For the Zlotnick Model, as well as other simple modelsof capsid assembly, the critical nucleus under assemblyequilibrium conditions ( µ = µ ∗ ) is a half-formed cap-sid . Experimentally measured values for the interactionenergies between capsid proteins are in the range of afew k B T . If the critical nuclei really were half-formedcapsids then the activation energy barrier for the assem-bly of actual capsids would be in the range of hundredsof k B T under conditions of assembly equilibrium, whichwould mean prohibitively slow kinetics. Actual measuredvalues of the size of the nucleation complex are muchsmaller [15]. As already noted, this indicates that capsidassembly takes place under conditions of a high degree ofsupersaturation. B. Spanning Trees

The second part of the deﬁnition of the model is thespeciﬁcation of “toy” ss RNA genome molecules. Theseare represented by tree graphs , i.e., collections of nodesconnected by links such that there is one and only onepath of links connecting any pair of nodes [42] (see Fig.3 top right). The construction of a genomic tree graphsstarts with a spanning tree graph of the dodecahedron.A spanning tree graph of a polyhedron is deﬁned as atree graph whose nodes are located on the vertices of thepolyhedron with just enough links to connect the nodestogether in a tree structure without circuits [43]. Eachmember of the set of spanning trees of the dodecahedronhas the same number of vertices (twenty) and the samenumber of links (nineteen). Figure 3 shows three waysto represent the same spanning tree graph. The top leftﬁgure shows a three dimensional representation with thesolid black lines indicating the spanning tree. Note thatthe spanning tree covers only 19 of the 30 edges of thedodecahedron. The top right ﬁgure is a two dimensionalrepresentation of a tree graph with the dodecahedron re-moved. In the bottom picture, the dodecahedron is repre-sented in the form of a planar Schlegel diagram [44] thatis decorated by the spanning tree graph (solid red lines).The ﬁnal step of the construction of a genome molecule isthe extension of a spanning tree by adding side-branchesso the tree covers the remaining eleven edges. These arerepresented by black arrows in Fig. 3. The completed ge-nomic tree molecule covers all edges of the dodecahedron.The red spanning tree links will represent the packagingsignals that have a speciﬁc binding aﬃnity to pentameredges that is enhanced with respect to generic electrostat-ics while the black arrows represent side branches withonly non-speciﬁc generic aﬃnity for pentamer edges.

C. Classiﬁcation Indices for Spanning TreeMolecules.

The number of unique spanning trees [45] of the do-decahedron is on the order of 10 and we need to classifythem. A “global” characteristic that has been appliedto classify RNA secondary structures is the MaximumLadder Distance (or MLD) [10, 46]. This is the maxi-mum number of paired RNA nucleotides separating anytwo nucleotides. The MLD of a secondary structure isa global measure of its size [10, 46]. Speciﬁcally, in theabsence of interaction between nodes, the solution ra-dius of gyration of the molecule scales with the MLD asa power law. The MLD is a topological invariant that FIG. 3. Top left: Spanning tree connecting the vertices of adodecahedron (solid lines). The dashed lines indicate edgesof the dodecahedron that are not part of the spanning tree ofspeciﬁc links. Six pentamers can be placed on the dodecahe-dron with each one wrapped by the spanning tree with fourlinks per pentamer (numbered). Top right: Planar graph ofthe same tree. Bottom: The spanning tree projected on aplanar Schlegel graph of the dodecahedron (red). The blackarrows represent side branches that are added to the spanningtree so that all edges of the dodecahedron are covered by alink of the tree. does not change if the molecule is folded up in diﬀer-ent ways. A systematic comparison between the genomicRNA molecules of RNA viruses revealed that they havesigniﬁcantly lower MLDs than randomized versions of thesame molecules [10, 46]. The analog of the MLD for thetoy genome molecules is the maximum number of linksseparating any pair of nodes. In graph theory, the ladderdistance between two nodes of a tree graph is called “the”distance while the MLD is known as the “diameter” of thetree graph [42]. The MLD of the tree molecule shown inFig. 3 is nine. It can be demonstrated that the smallestpossible MLD for a spanning tree of the dodecahedron isnine (as shown in Appendix A), while the largest possibleMLD of a spanning tree is nineteen. The former resem-ble a Bethe lattice while the latter is a Hamiltonian Path.A walk that visits all vertices of a polyhedron is said totrace a

Hamiltonian Path [47, 48]. Figure 4 is a plotof the number N of spanning trees of the dodecahedronas a function of the MLD. The plot has a pronouncedmaximum around MLD equal to 12. By comparison, theconﬁgurational entropy of an annealed branched polymercomposed of 19 monomers that is not constrained to be aspanning tree depends on the MLD as 19 − M LD /

19 [49]

FIG. 4. The number of spanning trees on the dodecahedron asfunction of the maximum ladder distance (MLD). Two span-ning trees of the dodecahedron that are related by a symmetryoperation of the dodecahedron are treated as the same. and thus has a maximum at the smallest possible MLD.Demanding that a tree molecule with a certain number oflinks is also a spanning tree of a dodecahedron constrainssigniﬁcantly the branching statistics.A second characteristic, complementary to the MLD,is the wrapping number (or “ N P ”). The wrapping num-ber of a spanning tree of the dodecahedron is the max-imum number of pentamers that can be placed on thedodecahedron such that all pentamers have four edgescovered by a link of the spanning tree (four—not ﬁve—isthe maximum number of speciﬁc links that can be asso-ciated with a pentamer). For the spanning tree shown inFig. 3, N P equals six. The maximum N P for a spanningtree of the dodecahedron is eight while the minimum istwo. The distribution of wrapping numbers is shown inFig. 5: The wrapping number distribution has a maxi- FIG. 5. The number of spanning trees on the dodecahedronas a function of the wrapping number. Two spanning trees ofthe dodecahedron that are related by a symmetry operationof the dodecahedron are treated as the same. mum at N P = 5.Unlike the MLD, the wrapping number of a treemolecule is not an invariant of the tree topology. In-stead, it depends on the spatial conﬁguration of a treemolecule distributed over the edges of a dodecahedron. An example is a linear spanning tree as shown in Fig.6. The Hamiltonian path of Fig. 6 has a wrapping num- FIG. 6. A possible Hamiltonian path for a linear genomemolecule. Only two of the pentamers are maximally wrapped.They have no shared edge. ber of two. After enumerating all possible Hamiltonianpaths, one ﬁnds that their wrapping number can be two,three, and four. In general, a spanning tree moleculecan be placed in diﬀerent ways on the edges of a dodec-ahedron and these diﬀerent conﬁgurations may have dif-ferent wrapping numbers. Figure 7 shows two spanningtree conﬁgurations with diﬀerent wrapping numbers thatbelong to the same tree molecule. Tree structures can

FIG. 7. Top: An MLD=13 tree molecule with a two-foldsymmetry site (marked). Bottom: Two dodecahedral span-ning tree conﬁgurations of this molecule. Both conﬁgurationsretain two-fold symmetry (marked). The conﬁguration on theleft has a wrapping number N P = 2 while the one on the righthas a wrapping number N P = 6 have multiple wrapping numbers. The wrapping numberis a geometrical characteristic of the diﬀerent ways todistribute nineteen speciﬁc links over the edges of a do-decahedron. A network with circuits that would visit allnodes of a dodecahedron would not have a well-deﬁnedMLD but it would still have a wrapping number.The wrapping number and MLD are correlated. Forexample, a spanning tree with a large wrapping numberis expected to be have many branches in order to max-imize the number of pentamers that can be accommo-dated. Hence it is expected to have a small MLD (andvice versa). Figure 8 is a plot of the range of allowedwrapping numbers for given MLD: The largest possible

10 12 14 16 18

MLD N P FIG. 8. Plot of the range of wrapping numbers ( N P ) of thespanning tree molecules of the dodecahedron as a function ofthe maximum ladder distance (MLD). wrapping number is N P = 8. In that case, the MLDcan only be nine, ten or eleven. For the smallest possiblewrapping number N P = 2 the MLD ranges from elevento nineteen. D. Assembly Energy Proﬁles.

We now construct the minimum energy assembly en-ergy proﬁles for the spanning tree model. The start-ing state is a tree molecule, composed of a spanningtree (the speciﬁc links) plus the additional eleven sidebranches (the non-speciﬁc links), that decorates all edgesof a mathematical dodecahedron. Physically, the start-ing state can be viewed as representing a folded or pre-condensed form of the viral ss RNA genome molecule(s)prior to its encapsidation by pentamers [50] Next, place n pentamers on the dodecahedron. The energy E ( n ) ofthe assembly is deﬁned to be E ( n ) = n (cid:15) + n (cid:15) + n (cid:15) + n (cid:15) − k B T µ n . Here, n is the number of speciﬁc linksof the tree that lie along a pentamer edge that is notshared with another pentamer, n is the number of spe-ciﬁc links that lie along a pentamer edge that is sharedwith another pentamer, n is the number of edges sharedbetween two pentamers that are associated with a non-speciﬁc link and n = 11 is the number of non-speciﬁclinks that lie along a pentamer edge that is not sharedwith another pentamer. The associated binding energiesare given as (cid:15) i with i = 1 , , ,

4. We will use an energy scale in which the binding energy (cid:15) of a non-speciﬁc linkis equal to zero. We also will assume that the interactionsbetween edges and links are additive . This means that (cid:15) is given by (cid:15) = (cid:15) + 2 (cid:15) . Finally, µ is again the refer-ence pentamer chemical potential. The assembly energyof a completed particle is equal to 19 (cid:15) + 11 (cid:15) − µ . All spanning trees thus have the same assembly energy. If (cid:15) is zero so (cid:15) = (cid:15) then the assembly energy proﬁles arethe same as that of the Zlotnick Model with (cid:15) = (cid:15) = (cid:15) .The energy parameters enter in the physics of assem-bly discussed in the next sections always in the form of β(cid:15) i with β = 1 /k B T . We will use dimensionless energyparameters (cid:15) (cid:48) i = (cid:15) i / | (cid:15) | . In these units (cid:15) (cid:48) = − β (cid:48) = β/ | (cid:15) | . That leaves only (cid:15) (cid:48) , β (cid:48) and µ as three freedimensionless energy parameters. Physically, | (cid:15) (cid:48) | repre-sents the ratio between the aﬃnities of a speciﬁc linkwith a pentamer edge and that of pentamer edges witheach other 1 /β (cid:48) is the dimensionless attractive interactionstrength between two pentamers in units of the thermalenergy. Below we will drop the primes.Minimum energy assembly pathways can now be con-structed in the same way as before. The degeneracy ofthe energy proﬁle Zlotnick Model is greatly reduced. Asan example, Fig. 9 shows a minimum energy assem-bly proﬁles for the M LD = 9, N P = 6 tree of Fig. 3(from here on referred to as “molecule (1)”) and for the M LD = 19, N P = 2 tree of Fig. 6 (from here on re-ferred to as “molecule (2)”). These two energy proﬁles FIG. 9. Assembly energy proﬁles for minimum energy assem-bly pathways for the

MLD = 9, N P = 5 spanning tree of Fig.3 (left) and the MLD = 19, N P = 2 spanning tree of Fig. 6.Energy parameters are (cid:15) = − . (cid:15) = −

1, and µ = − . are consistent with what is expected from nucleation andgrowth theory for a state of elevated supersaturation.Note how similar they are: the total assembly energies(solid red lines) are of course the same by construction(about 7.5 in our units) but the heights of the assemblyenergy activation barriers also are the same (about 2.5in our units). The main diﬀerence is that the width ofthe energy barrier for molecule (2) is somewhat largerthan that of molecule (1). Mathematically, the assem-bly process can be viewed as a random walk over energylandscapes of the form shown in Fig. 9, which suggeststhat molecule (1) will have somewhat faster assembly ki-netics.Next, we increased minus | (cid:15) | from 0 . . − . − . (cid:15) = − . FIG. 10. Assembly energy proﬁles for molecule (1) (left) andmolecule (2) (right) for energy parameters (cid:15) = − . (cid:15) = − µ = − . total assembly energy is still the same (which is by con-struction) but the two proﬁles now have a quite diﬀerentappearance. The activation energy barriers also are quitediﬀerent The reason is that the minimum energy place-ment for the second pentamer for molecule (2) is, for largevalues of minus (cid:15) determined by the requirement thatit is maximally wrapped. The two maximally wrappedpentamers of molecule (2) do not have a shared edge (seeFig. 6). The n = 2 state of molecule (2) has a higherenergy than that of molecule (1) because it lost the at-tractive energy between the edge shared between the twoﬁrst pentamers of molecule (1). Examining the assem-bly energy proﬁles of molecules with diﬀerent MLD and N P we found that, in most cases, genome molecules withthe same MLD and N P have the same, or closely similar,minimum energy assembly proﬁles and pathways. On theother hand, genome molecules with the same MLD butdiﬀerent N P can have quite diﬀerent proﬁles and path-ways.Finally, one also can deﬁne disassembly energy path-ways using the same method except that now pentamersare removed successively, each time with the minimumenergy disassembly cost. By enumeration we found that,in nearly all cases, the disassembly path simply reversesthe assembly path. E. Mechanical Rigidity.

Figure 3 (top left) shows a six pentamer structureplaced on the dodecahedron. Each of these pentamershas the maximum of four contacts with a speciﬁc (red)link per pentamer. For any (negative) value of (cid:15) , theenergy E (6) of this state has the lowest value any six-pentamer structure could have when placed on the do-decahedron for any spanning tree. In other words, for an n ∗ = 6 nucleation complex (with µ = µ ∗ ) the activation energy barrier could not be any lower. One thus mightexpect the tree structure of Fig. 3 to be a particularlystrong contender in a packaging competition experiment.This structure has however another interesting feature. Ifone were to (i) relax the constraints that keep the linksof the genome molecule associated with the edges of amathematical dodecahedron, (ii) allow tree links to ro-tate freely around the nodes of the tree and (iii) allowpentamers to swivel around shared edges then the pen-tamers still could not move with respect to each otherwithout breaking pentamer-pentamer bonds. Only rigid-body translations and rotations would be possible. Wewill say that pentamer assemblies with this property aremechanically rigid . The pentamer structures of the min-imum energy assembly pathway of the Zlotnick Model(see Fig.1) are all rigid past n = 2.As a counterexample, the linear genome molecule with19 edges of Fig. 6 with wrapping number two can ac-commodate only two pentamers with a maximum of fourfavorable contacts. Placing a third and fourth pentameron the dodecahedron along a minimum energy assemblypath generates the linear four-pentamer structure shownin Fig. 11. If this structure were released from the do- FIG. 11. Intermediate n = 4 assembly state of molecule (2)for large values of | (cid:15) | . The structure is mechanically ﬂaccid. decahedron and allowed to freely ﬂuctuate then the fourpentamers could freely swivel along shared edges. Wewill call such an unstable assembly mechanically ﬂaccid .Assembly intermediates that are ﬂaccid are expected tobe characterized by strong conformational thermal ﬂuc-tuations. Whether or not an assembly intermediates areﬂaccid is controlled by the parameter (cid:15) . We ﬁnd thatin the limit that (cid:15) is small compared to the other en-ergy parameters, all assembly intermediates are rigid andbasically reproduce the scenario of the Zlotnick Model.The assembly intermediates of some—though not all—pathways become ﬂaccid when | (cid:15) | is increased past acritical value of the order of one. For molecule (2) thetransition from rigid to ﬂaccid intermediates takes placeexactly at (cid:15) = − (cid:15) is independent of µ and β ). Another exampleis the tree in Fig. 7 with N P = 6 which shows the ﬂaccidminimum energy state when minus (cid:15) exceeds a criticalvalue. Transitions from rigid to ﬂaccid as a function of | (cid:15) | become more common for increasing MLD. III. BOLTZMANN DISTRIBUTION.

The minimum energy assembly pathways can be usedto deﬁne “low-temperature”

Boltzmann Distributions forthe thirteen probabilities P n that a tree molecule is as-sociated with n pentamers. The low-temperature limitapplies if assembly and disassembly processes are en-tirely restricted to the minimum energy pathways dis-cussed earlier. Under those conditions, the Boltzmanndistribution equals P n ∝ exp ( − βE ( n ) + n ln( c f )) (3.1)The last term in the argument of the exponential is ob-tained by replacing the reference chemical potential µ with the actual chemical potential µ = µ + ln( c f ) where c f is the concentration of free pentamers in solution (inunits of the reference concentration). The validity con-dition of the low-temperature approximation is that thequantities β | (cid:15) i | are larger than one.It can be checked that it follows from the Boltzmanndistribution that c f r c = K (3.2)where K = exp βE (12), r the concentration of freeRNA molecules and where the c n are the concentrationsof RNA molecules associated with n pentamers. If r t is the total solution concentration of genome moleculesthen P n = c n /r t . Equation 3.2 has the form of the Lawof Mass Action (LMA) for a chemical reaction in whichtwelve free pentamers in solution “react” with a genomemolecule not associated with pentamers, forming a vi-ral particle with twelve pentamers. The low temperatureBoltzmann distribution Eq. (3.1) is thus consistent witha chemical reaction picture. In the context of the LMA,the quantity K is known as the dissociation constant ofthe reaction. This dissociation constant depends on theenergy parameters only through the total assembly en-ergy so it is independent of the MLD and N P numbers. A. Conservation Laws

The probability distribution P n is related to conser-vation laws for the number of pentamers and genomemolecules. Conservation of genome molecules requiresthat r + (cid:80) n =1 c n = r t . This is assured if (cid:80) n =0 P eqn = 1.The proportionality factor in the deﬁnition of P n is deter-mined by this condition. Next, conservation of pentamermolecules is assured if c f = c Γ ([ P n ]) whereΓ ([ P n ]) ≡ − ( D/ (cid:88) n =1 nP n (3.3) with D = 12 r t /c , c f the concentration of free pentamersand c the total pentamer concentration.The quantity D is the mixing ratio mentioned in Sec-tion I as an important thermodynamic control parameter.If D = 1 then there are exactly enough pentamers to en-capsidate all genome molecules so D = 1 correspondsto the stoichiometric ratio. Because c f = c Γ ([ P n ])depends on the set of all P n , the thirteen Boltzmannequations are in fact coupled and need to be solved self-consistently. B. Self-Assembly

A standard diagnostic plot for self-assembling systemsis that of the concentration of free building blocks andthat of the assembled particles as a function of the totalconcentration of building blocks [51]. Such a plot wascomputed from the self-consistent solution of the Boltz-mann distribution Eq.3.1 for example molecule (1) shownin 3. The energy parameters are the same as for Fig.9. Figure 12 shows a plot of the concentrations of freepentamers in solution and that of pentamers that arepart of an assembled particle as a function of the totalpentamer concentration c . For pentamer concentrations FIG. 12. Equilibrium self-assembly diagram for molecule (1)with the parameter values of Fig. 9. Horizontal axis: totalpentamer concentration c . Vertical axis: either the free pen-tamer concentration c f (blue) or the concentration c − c f ofpentamers that are associated with a tree molecule (ochre).Solid lines: solution of Eq.3.8 that are small compared to one (in our units), nearly allpentamers are free in solution so c f (cid:39) c . As the totalconcentration increases, the concentration of free pen-tamers saturates when particle assembly starts. In thisregime the concentration of pentamers that are part ofan assembled particle increases roughly proportional tothe total concentration. The resulting diagram is typicalof that of self-assembling systems [51].The occupation probability P for completed parti-cles and the probability P for pentamer-free genomemolecules are, for these parameter values, large com-pared to the intermediate occupation probabilities with1 ≤ n ≤

11. If one can neglect these assembly interme-diates then the conservation law for genome moleculesreduce to P (cid:39) (1 − P ) and that of pentamers to c f (cid:39) c (1 − DP ). Inserting these two relations into theLMA equation Eq.3.2 produces an (approximate) equa-tion for the concentration c f of free pentamers: (cid:18) c f c (cid:19) (cid:32) D − c f c − c f c (cid:33) (cid:39) (cid:18) Kc (cid:19) (3.4)For (cid:16) c K (cid:17) small compared to one, this equation has asolution with c f close to c : c f c (cid:39) − Dc K (3.5)For (cid:16) c K (cid:17) large compared to one and D larger than one,the equation has a solution with c f independent of c : c f (cid:39) (cid:18) KD − (cid:19) / (3.6)Finally, for (cid:16) c K (cid:17) large compared to one but D less thanone, the equation has a solution with c f independent of c : c f c (cid:39) − D + Kc D (1 − D ) / (3.7)There is thus a change in regimes when the depletionfactor is equal to one. For the special case that D = 1,the LMA equation reduces to (cid:18) c f c (cid:19) (cid:32) − c f c (cid:33) (cid:39) (cid:18) Kc (cid:19) (3.8)The solid lines in Fig. 12 for K equal to exp β ∆ E with∆ E = 13 . C. Mixing Ratio.

We can use the Boltzmann distribution to address thequestion how the mixing ratio D aﬀects assembly un-der equilibrium conditions. This is done by computingcontours in the c − D plane along which the packagingfraction P is ﬁxed. The dots in Fig. 13 were obtainedwere computed numerically for two diﬀerent values of P FIG. 13. Phase coexistence of assembled particles with excessfree monomers and excess free tree molecules for (cid:15) = − . (cid:15) = −

1, and µ = −

2. Horizontal axis: Depletion factor D .Vertical axis: Pentamer concentration c . Blue dots: pointswhere 95 percent of the genome molecules are packaged. Mostof the remaining pentamers are free. Ochre dots: points where35 percent of the tree molecules are packaged. The green sec-tor is below the critical aggregation concentration at c (cid:39) . for the same tree molecule and energy parameters as Fig.9: Contours of ﬁxed genome packing fraction can be com-puted analytically if one again neglects assembly inter-mediates. This approximation produces the family ofhyperbolae c ( D ) (cid:39) − D P ) (cid:18) KP − P (cid:19) / (3.9)Note that c ( D ) diverges at D = 1 /P . The agreementbetween the approximate equilibrium theory and the ac-tual values is reasonable for the P = 0 .

95 contour (solidblue line vs. blue dots). The blue curve in Fig. 13 canbe viewed as an equilibrium phase boundary that sepa-rates two forms of phase coexistence. To the left, nearlyall tree molecules are encapsidated with assembled par-ticles in coexistence with excess free monomers while tothe right most pentamers are part of assembled parti-cles in coexistence with excess free tree molecules. Thisis consistent with the intuitive chemical reaction picturediscussed in the introduction section. It seems surprisingthat the boundary line is shifted to values of D below thestoichiometric ratio D = 1 when c is reduced. The rea-son for this shift becomes evident if one recalls there canbe no assembly for c less than the CAC for empty cap-sid assembly. According to Fig. 13, this CAC is around00 . D below whichthe state of assembled particles dominates thus necessar-ily has to go to zero as c approaches the CAC.The agreement of the P = 0 .

35 contour (ochre line)with the actual values (ochre dots) is poor. This can betraced to the neglect of assembly intermediates. Notealso that the P = 0 .

35 contour diverges at larger valuesof D than the P = 0 .

95 contour. Figure 14 (Top) showsthat for larger values of D the relative contribution ofassembly intermediates indeed becomes comparable tothe concentration of assembled capsids, which invalidatesthe assumption used to obtain Eq.3.9. FIG. 14. Equilibrium occupation numbers as a function ofthe depletion factor D for c = 1 .

0. Top: same parame-ters as Fig. 10. The stoichiometric ratio D=1 is indicated.Bottom: same parameters as Fig.10 except that the ratio ofthe attractive interactions between the RNA-capsid and thecapsid-capsid interactions has increased from 0 . .

2. Thereference chemical potential µ was reduced to − When the ratio between the genome/pentamer andpentamer/pentamer interaction strengths is increasedfrom − (cid:15) = 0 . − (cid:15) = 1 . D larger than one, as shown inFig. 14 (bottom). This is due to the fact that when D increases more and more pentamer binding sites becomeavailable as there are more tree molecules. Breaking upassembled particles and distributing the pentamers overthe additional binding sites increases the entropy of thesystem while larger values of the binding energy of indi-vidual pentamers to speciﬁc edges of the tree moleculesmake up for the loss of pentamer-pentamer adhesion. Asa result a high-entropy state with a distribution of partialshells has a lower free energy than the state of coexistence of viral particles with excess genome molecules. Thistransition can be viewed as a form of an order-disordertransition . While such a transition would seem to conﬂictwith the earlier chemical reaction picture, in actuality theLMA itself remains valid for Fig. 14(Bottom) providedall assembly intermediates. IV. KINETICS.

In this section we complete the deﬁnition of the modelby specifying the kinetics. We ﬁrst consider the casewhere there is only one kind of spanning tree molecule insolution.

A. Master Equation.

The assembly kinetics of the model is deﬁned in termsof thirteen occupation probabilities P n ( t ) that now de-pend on time. The assembly and disassembly dynamicsfor a given tree molecule with a particular assembly path-way is assumed to obey Markov chain statistics [52] forwhich the occupation probabilities evolve in time accord-ing to the master equation [53]: dP n ( t ) dt = (cid:88) m = n ± ( W m,n P m ( t ) − W n,m P n ( t )) (4.1)Here, W m,n is a thirteen-by-thirteen matrix of transitionrates from state m to state n . We only include transi-tions with m = n ± W m,n are not speciﬁed at this point. The matrix of transitionrates W m,n will be deﬁned in terms of simple diﬀusion-limited chemical kinetics (see Eq. 8.35 of ref.[54]) wherethe addition of a pentamer to a genome molecule asso-ciated with a pentamer cluster of size n is treated as abimolecular reaction with a rate k n,n +1 r n c f where k n,n +1 is the on-rate deﬁned as k n,n +1 = λ (cid:40) e − β ( E ( n +1) − E ( n )) if E ( n + 1) > E ( n )1 if E ( n + 1) < E ( n )(4.2)Here, λ is a base rate that depends on quantities like thediﬀusion coeﬃcient but that is independent of concentra-tion. If adding a pentamer reduces the energy then therate is equal to the base rate. If there is an energy costto adding a pentamers, then the base rate is reduced byan Arrhenius factor, similar to the Metropolis algorithmof Monte-Carlo simulations.The on-rates are related to the rate matrix by W n,n +1 = k n,n +1 c f . The entries of the rate matrix thatcorrespond to adding a pentamer are then W n,n +1 = λc Γ ([ P i ]) (cid:40) e − β ∆ E n,n +1 if ∆ E n,n +1 >

01 if ∆ E n,n +1 < E n,n +1 ≡ ( E ( n + 1) − E ( n )). The oﬀ-rate en-tries W n +1 ,n are determined by the on-rates through thecondition of detailed balance : W n +1 ,n W n,n +1 = P n P n +1 (4.4)where on the right hand side the Boltzmann DistributionEq. (3.1) must be inserted. This results in W n,n +1 = c Γ ([ P i ]) (cid:40) e − β ∆ E n,n +1 ∆ E n,n +1 >

01 ∆ E n,n +1 < W n +1 ,n = (cid:40) E n,n +1 > e β ∆ E n,n +1 ∆ E n,n +1 < λ has been absorbed in a redeﬁnitionof time. Note that the master equation is nonlinear be-cause of the dependence of the factor Γ( | P i | ), as deﬁnedin Eq. (3.3), on the occupation probabilities.Some examples of numerical solutions of the masterequation are shown in Fig. 15 for molecule (1) under ref-erence conditions c = D = β = 1. The top ﬁgure shows FIG. 15. Numerical solution of the master equation formolecule (1). Top: parameter values are those of Fig. 9 with (cid:15) = − .

2. Bottom: same except that (cid:15) = − .

0. Shownare the occupation probabilities P n ( t ). The color code is thesame as that of Fig.14. Time is in dimensionless units of 1 /λ . the case of (cid:15) = − .

2. Other energy parameters are those of Fig. 12), which shows the energy proﬁle that corre-sponds to the ﬁgure. The red curve is the probability P ( t ) for a tree to be encapsidated, the black curve theprobability P ( t ) for a tree to be free of pentamers. Even-tually, about forty percent of the genome molecules areencapsidated by complete capsids with n = 12 pentamerswhile about ﬁve percent are encapsidated by partial cap-sids with n less than 12. We checked the assembly path-way and it is consistent with the Zlotnick Model shownin Fig. 1. The occupation probabilities exponentiallyapproach constant values at late times that agree withthe Boltzmann distribution. Closely similar results areobtained if one replaces molecule (1) by molecule (2). Atthe this point, the packaging kinetics does not seem tobe able to really distinguish between the two molecules.The bottom ﬁgure shows the eﬀect of increasing therelative strength of the interaction between pentamersand genome molecules to (cid:15) = − .

0, while the referencechemical potential is reduced to µ = − . (cid:15) = − .

0, theassembly pathway still follows that of the Zlotnick Modelfor both molecule (1) and (2), for (cid:15) = − . B. Time Scales

The occupation probabilities in Fig. 15 are governedby multiple scales. The ﬁrst important time scale is de-ﬁned in Fig. 16, which shows the early stages of the as-sembly of molecule (1) for (cid:15) = − . delay time t d is FIG. 16. Assembly shock-wave for molecule (1) for (cid:15) = − . P ( t ) curvewith the time axis (solid black line) deﬁnes the assembly lagtime t d . P ( t )with the time axis. This gives t d (cid:39)

11 for the case (ofmolecule (1) with (cid:15) = − .

2. As mentioned in Section I,assembly delay times are a familiar feature of the assem-bly kinetics of empty capsids [15] and of aggregation phe-nomena in general [55]. A second important time scalerefers to the late-time relaxation of the occupation proba-bilities towards the Boltzmann distribution. Suppose onecompletes the deﬁnition of the transition matrix by intro-ducing the diagonal entries W n,n = − (cid:80) m (cid:54) = n W ( m, n ).The resulting matrix W m,n now has column elementsadding to zero. Using this completed transition matrix,the master equation can be rewritten in the form of amatrix equation d P dt = WP . It immediately follows thateigenvalues of the completed transition matrix are thedecay rates of the various modes that correspond to theeigenvectors. The relaxation rate determining the ap-proach to ﬁnal equilibrium is the smallest eigenvalue of W m,n . The relaxation time t r is deﬁned as the inverse ofthe smallest eigenvalue so P n ( t ) − P ( ∞ ) ∝ exp − t/t r inthe late time limit for any n . Figure 15 shows that thistime-scale is much longer than the delay time t d , specif-ically t r (cid:39)

644 so about two orders of magnitude longerthan t d . The diﬀerent functions P n ( t ) in Fig. 16 displaya maximum as a function of n . The corresponding peaktimes increase with n , thus describing a type of assem-bly shock-wave propagating in conﬁguration space fromsmall to large n . Similar assembly shock waves have beenreported for dynamical versions of the Zlotnick Model[13, 38]. The dependence of the two time scales on theenergy scale is also quite diﬀerent. For β = 3, t d (cid:39) . t r (cid:39) . × producing a signiﬁcant separationin time-scales. Relaxation to equilibrium is evidently anactivated process, which is not surprising given the formof the assembly energy proﬁles. In addition, increasing β from one to three also has the eﬀect of suppressingthe partially assembled capsids in Fig. 15 (top). Thisremains the case if the mixing ratio D is increased fromone to two, which now produces simple phase coexistenceof fully assembled particles and bare genome molecules.Figure 17 shows the eﬀect on the kinetics of varying thetotal pentamer concentration c . The plot qualitativelyreproduces the time-dependent light scattering studies ofthe assembly of empty capsids [15]. Note that there isno assembly if the pentamer concentration drops belowabout 0 .

2, which is close to the CAC shown in Figs 12 and13. Fixing the concentration and varying the depletionfactor produces similar-looking plots.

C. Disassembly Kinetics.

An important question concerns the fate of assembledparticles when the solution concentration c of pentamersis reduced. This can be probed by using the outcome ofan assembly run with the total pentamer concentration c = 1 as the initial condition for a second run with aten-fold reduced pentamer concentration, so with c = 1 FIG. 17. Plots of the fraction P ( t ) of genome moleculesthat are fully encapsidated for increasing total concentrationfor diﬀerent total pentamer concentration c ranging from 1.0to 0.2. Particles do not form for c less than 0.2. reduced to c = 0 .

1. The results are shown in Fig. 18,still for molecule (1). For β = 1, the assembled particles FIG. 18. Top: The assembly run of Fig. 14 at c = 1 and β = 1 followed by a disassembly run at c = 0 .

1. The colorcoding is that of Fig.12. The assembled particles evaporate.Bottom: Same as the top ﬁgure except that the temperatureis reduced to β = 3. rapidly disintegrate when the pentamer concentration isreduced. The time-scale for the disintegration is of theorder of the relaxation time t r (cid:39) .

6, which is reducedcompared to the case of c = 1. The energy activationbarrier is, in this case, unable to “protect” the assembledparticle from disintegration by thermal activation. Recallwe found that when β was increased from 1 to 3, then thedelay time t d barely increased while the relaxation time3 t r increased by three orders of magnitude. Repeatingthe assembly-disassembly run for β = 3 (bottom ﬁgure),produced the result that the concentration of assembledparticles still decreased with time during the disassem-bly run but now on time-scales of the order of 10 . Incontrast, during the assembly run about ﬁfty percent ofthe genome molecule had been encapsidated by a time t (cid:39) . This assembly time scale is signiﬁcantly longerthan the delay time but still is two orders of magnitudeshorter than the characteristic time scale for disassem-bly. We attribute this to the fact the activation energybarrier for assembly is signiﬁcantly lower than for disas-sembly for a supersaturated system. Such a separationin assembly and disassembly time scales would seem tobe an essential condition for a functioning viral particle.Below we will focus on the case of β = 3.Next, we carried out assembly-disassembly runs for (cid:15) = − . µ = − .

6. The assembled particlesand the various intermediate aggregates rapidly “evap-orated” during the disassembly run even for β = 3. Thereason for the loss of stability can be seen by comparingthe minimum energy assembly proﬁles for (cid:15) = − . (cid:15) = − . µ ∗ . For (cid:15) = − .

2, the energy proﬁle is quite similar

FIG. 19. Assembly energy proﬁles for (cid:15) = − . (cid:15) = − . µ = µ ∗ . to that of the Zlotnick Model (see Fig. 2). A standardnucleation and growth scenario is expected for the assem-bly process. On the other hand, E ( n ) has a deep mini-mum at n = 6 for (cid:15) = − . exact opposite of what is predicted by the nucleation and growth scenario. Recall that in that case n = 6 is expected to be close to an energy maximum.Increasing the magnitude of (cid:15) indeed progressively de-stroys the activation energy barrier that “protects” theassembled state. V. PACKAGING COMPETITION

We are now in a position to carry out packaging com-petitions between two genome molecules that either havea diﬀerent tree topology or that have the same tree topol-ogy but diﬀerent dodecahedral wrapping conﬁgurations.We did this for a system containing a concentration c of pentamers as well as equal concentrations of molecules(1) and (2) that competed with each other for packagingfor the case (cid:15) = − .

2. Let P (1) n ( t ) and P (2) n ( t ) be thecorresponding sets of occupation probabilities. Conser-vation of both types of tree molecules requires that theoccupation probabilities P (1) n ( t ) and P (2) n ( t ) separatelysum to one. Conservation of pentamers is satisﬁed if( c f /c ) = Γ ([ P n ]) obeys the conditionΓ ([ P n ]) = 1 − ( D/ (cid:88) n =1 n ( P (1) n + P (2) n ) (5.1)The two occupation probabilities P (1) n ( t ) and P (2) n ( t )obey separate master equations of the form of Eq. (4.1).The only diﬀerence is that now the factor Γ ([ P n ]) in theexpression for the transition matrix Eq.5.1 couples thetwo master equations. Figure 20 shows an example ofthe solution of the two coupled master equations for twodiﬀerent parameter sets. The relaxation times are indi-cated by arrows. Based on the discussion in Section 2,it is expected that the highly branched molecule (1) willoutcompete linear molecule (2) since it can accommodatemore optimally wrapped pentamers during assembly. Re-call however that in actuality the assembly energy proﬁlesof the two molecules are not very diﬀerent (see Fig.9).The maximum selectivity in Fig. 20(A) of molecule (1)over molecule (2) for parameter values β = 3, c = D = 1and µ = − .

5. It is achieved very early in the assemblyprocess. In order to be quantitative, deﬁne the relativepackaging selectivity S (1 |

2) of molecule (1) with respectto molecule (2) to be S (1 | ≡ ( P (1)12 − P (2)12 ) / ( P (1)12 + P (2)12 ) (5.2)which ranges from one to minus one. This fraction de-pends on the time instant when probabilities are beingcomputed. The maximum value of the relative packag-ing selectivity will be denoted by S m (1 | S m (1 | (cid:39) .

43. The selectiv-ity S (1 | t ) then slowly decays to zero on a time scaleof the order of the relaxation time t (2) r . The fact thatthe selectivity goes to zero in the long time limit can beunderstood from the fact that the total assembly ener-gies of molecules (1) and (2) are the same, along with the4 FIG. 20. Time-dependence of the fraction of packaged treemolecules of example molecules (1) and (2). (A): β = 3, c = D = 1 and µ = − .

5. The relaxation times of the twomolecules are indicated by the two arrows. (B): Same as (A)except that c = 0 .

22 and D = 0 . fact that there are practically no intermediate assemblies.Selectivity is a purely kinetic eﬀect.To further increase the selectivity, we varied the to-tal pentamer concentration c . For given c , the mix-ing ratio D was given the maximum value for whichat least 90 percent of the type (1) genome moleculeswere still encapsidated. The maximum selectivity wasobtained when c was reduced to a value close to thelower bound of 0 . c (cid:39) .

22 with D = 0 .

3. The maximum selectivityis S m (1 | (cid:39) .

8. This is the highest selectivity that wewere able to achieve. This increase in selectivity near theCAC is not the result of increasing relaxation times (seeFig. 20(B)). If c drops below the CAC then the selectiv-ity of the particles that assemble remains about 0 . c is intriguing. In an Arrhenius descriptionthe nucleation rate for particle assembly is the product ofan attempt frequency and a Boltzmann factor exp − β ∆ E with ∆ E the assembly activation barrier. The assemblyenergy proﬁles do not depend on the pentamer concen-tration c so this increase of the selectivity when c isreduced must be due to a dependence of the attemptfrequency on c in a manner that diﬀerentiates betweenthe two molecules. According to Fig. 9, the ﬁrst twoassembly steps of molecules (1) and (2) that bring theprocess to the energy maximum at n = 2 are the same.Once past n = 2, the negative slope of the energy pro-ﬁle for molecule (1) is signiﬁcantly steeper than that of molecule (2). Only after three additional assembly stepsdoes the energy proﬁle of molecule (2) acquire a similarlylarge negative slope. This distinction is a consequence ofthe fact that the wrapping number of molecule (1) is sixwhile it is two for molecule (2). The energy of capsidassemblies with n greater than two thus decreases morerapidly for molecule (1). Moreover, the two maximallywrapped pentamers of molecule (2) are not adjacent. Ifthe pentamer concentration is low then the packaging ofmolecule (2) will be signiﬁcantly delayed as compared tomolecule (1). The system performs a weakly directedrandom walk along the ﬂattish top of the energy barrierwith a signiﬁcant probability for disassembly of the as-sembly nucleus. This happens if the random walk returnsto n=2. As a result, the eﬀective attempt frequency ofmolecule (2) is signiﬁcantly reduced. The reason whyin the limit of small c the mixing ratio D must be re-duced to well below stoichiometric ratio D = 1 in orderto achieve a 90 percent packaging probability is a con-sequence of the equilibrium thermodynamics discussedearlier. Note that the point c ( D = 0) on the contourin the equilibrium phase-coexistence diagram Fig. 13 forﬁxed 90 percent packaging probability is also the pointwith the lowest value for c .We constructed a table of packaging selectivities ofpairs of trees with diﬀerent MLD and wrapping num-bers (Fig. 21). This was done for β = 3 and c = 0 . .

2, which produced an increase of the mixingratio for 90 percent packaging from D = 0 . D = 0 .

5. The result is clear: when two trees

FIG. 21. Plot of wrapping numbers ( N P ) and maximum lad-der distances (MLD showing the outcomes of packaging con-tests indicated by arrows. Each arrow indicates the outcomeof a contest, with the arrow pointing from the loser to the thewinner. The width of the arrow is a measure of the maximumselectivity S m . A dashed arrow indicates a weak selectiv-ity. The energy parameters were β = 3, c = 0 . D = 0 . (cid:15) = − . (cid:15) = −

1, and µ = − . compete with the same wrapping number but diﬀerentMLD, then the tree with the smaller MLD outcompetesthe tree with the larger MLD. When two trees with the5 FIG. 22. Time-dependence of the packaging selectivity S (1 | t ) of molecule (1) with respect to molecule (2). Theenergy parameters are (cid:15) = − . (cid:15) = −

1, and µ = − . same MLD but diﬀerent wrapping number compete, thenthe tree with the larger wrapping number outcompetesthe tree with the smaller wrapping number.It would seem that one should be able to achievean even higher selectivity by increasing | (cid:15) | , the ratioof strength of the genome-pentamer attraction and thepentamer-pentamer attraction. The outcome of a pack-aging contest between molecules (1) and (2) is shown inFig. 22 for the case | (cid:15) | = 1. The reference chemical po-tential was adjusted so the energy activation barrier wasthe same as for our standard value | (cid:15) | = 0 .

2. Figure 22shows the time-dependence of the selectivity S (1 | t less than 10 — S (1 |

2) rises very quickly to (nearly) one, which agreeswith the expectation that increasing | (cid:15) | should enhancethe packaging selectivity. However, as time progresses,fully packaged type (1) molecules particles start to disas-semble while additional molecule (2) assemblies start toappear. The packaging selectivity drops, goes to zero andthen changes sign . This “inversion” is in fact a genericfeature of packaging competitions for large values of | (cid:15) | .Eventually, about 70 percent of the type (2) moleculesare packaged in complete capsids. A large fraction oftype (1) molecule are associated with a polydisperse dis-tribution of incomplete particles. Recall from Section IVthat increasing | (cid:15) | produces a large fraction of incom-plete particles . Recall also that for large | (cid:15) | assembledparticles easily disintegrate when the pentamer concen-tration is reduced.Since increasing | (cid:15) | leads to thermodynamically unsta-ble particles, how small can one make | (cid:15) | and still havea reasonable selectivity? Figure 23 shows how the maxi-mum packaging selectivity S (1 | m depends on | (cid:15) | in theregime of small | (cid:15) | . The maximum packaging selectivityrises very steeply from zero for increasing | (cid:15) | and reachesit maximum value around | (cid:15) | (cid:39) .

4. Polydispersity andselectivity inversion only appears for larger values of | (cid:15) | . | | S ( | ) m FIG. 23. Maximum packaging selectivity S (1 | m of molecule(1) with respect to molecule (2) as a function of | (cid:15) | . Thereference chemical potential µ was adjusted to maintain theactivation energy barrier at an approximately constant value. VI. CONCLUSION

We have presented a simple, quasi-analytical model forthe study of the packaging of tree-like genome moleculesinside dodecahedral capsids composed of twelve pen-tamers. The model is suﬃciently simple that the kineticscan be obtained from the numerical solution of a set ofcoupled master equations. The genome molecules arerepresented by a “core” of speciﬁc links with enhancedaﬃnity for the edges and that visits all vertices of the do-decahedron but covers only nineteen of the edges. Thespanning tree is complemented by eleven non-speciﬁclinks with reduced edge aﬃnity. The minimum energyassembly pathways are characterized by two numbers:the maximum ladder distance (MLD) and the wrappingnumber ( N P .) The MLD of a spanning tree is a topolog-ical invariant of the tree that is a global measure of the“branchiness” of a tree while N P is a geometrical mea-sure of the surface distribution of speciﬁc links over thedodecahedron in terms of the number of pentamers asso-ciated with a maximum of four speciﬁc links that can beaccommodated.The assembly kinetics is characterized by a delay time t d for the onset of particle production and a thermody-namic relaxation time t r . When the solution concen-tration c of pentamers is lowered, assembled particlesare stable against disassembly by thermal ﬂuctuationson time scales shorter than t r . If the energy scale ofthe binding energy is large compared to the thermal en-ergy and if the ratio | (cid:15) | between speciﬁc and non-speciﬁcaﬃnity of links of the genome molecules for the pentameredges is small compared to one then there is a pronouncedseparation in time scale between particle assembly athigh c and particle disassembly at low c . We carriedout packaging selection contests between molecules withdiﬀerent MLD and N P . The selectivity is a purely kineticeﬀect, based on minor but systematic diﬀerences betweenthe minimum energy assembly pathways of the diﬀerent6genome molecules. For small values of | (cid:15) | molecules withsmall MLD and large N P outcompete molecules withlarge MLD and small N P . For increasing | (cid:15) | , genomeselectivity increases as well but if | (cid:15) | becomes compara-ble to one then assembly produces a polydisperse solutionof partially assembled particles that readily disintegratewhen the pentamer solution is reduced. In addition, forlarge | (cid:15) | the assembly energy proﬁles become increas-ingly complex, leading to the failure of the nucleation-and-growth scenario. Another interesting consequenceof increasing | (cid:15) | is a change from mechanically rigid tomechanically ﬂaccid assembly intermediates.In the Introduction we mentioned a number of exper-imental observations that motivated the construction ofthe model. The ﬁrst was an in-vitro study of the co-assembly of CCMV with non-CCMV RNA molecules[30]. When the RNA-to-protein mixing ratio was low,virus-like particles (VLPs) formed with excess proteins,in agreement with simple arguments based on phase-coexistence. When the mixing ratio exceeeded a thresh-old, the virus-like particles were replaced by disorderedRNA-protein aggregates instead of the expected coexis-tence of VLPs with excess RNA. Moreover, the thresholdpoint separating the two regimes was well below the sto-ichiometric ratio. Can this be understood in the lightof our results? When the mixing ratio (i.e., the deple-tion factor D ) is increased at ﬁxed total protein con-centration c in Fig. 13 then for smaller D practicallyall tree molecules are encapsidated (i.e., to the left ofthe blue line in Fig. 13) while larger values of D anincreasing fraction of the tree molecules are not encapsi-dated. The transition point separation the two regimesdrops below the stoichiometric ratio when c approachesthe CAC. The authors of the experimental study [30]relate this displacement away from the stoichiometric ra-tio to electrostatic eﬀects, which are not included in ourmodel. The model shows that there is a separate mech-anism that could produce the same eﬀect. An experi-mental study in which the capsid protein concentrationis reduced towards the CAC, perhaps accompanied bychanges in the salinity, may be able to distinguish be-tween the two mechanisms. Next, Fig. 14 shows thatwith increasing D , partially assembled capsids start toappear as observed experimentally. This would seem tobe consistent with the proposed model but this polydis-persity only becomes a dominant eﬀect if | (cid:15) | approachesone (see Fig. 14 bottom). In that case assembled parti-cles should be thermodynamically unstable, as discussedin section IV-C. In other words, they could not be true VLPs. This certainly would be an interesting result ifit were conﬁrmed experimentally. A second interestingobservation was the fact that for increasing values of theratio between the RNA-protein and the protein-proteinaﬃnities, virus-like particles were replaced by disorderedaggregates [31]. In our model, this ratio is represented by | (cid:15) | . For increasing | (cid:15) | fully assembled shells indeed arereplaced by partially assembled shells. Depending on theMLD and N P numbers, the assemblies can be mechan-ically unstable. A third observation concerned the factthat asymmetric reconstruction of the MS2 virus showedthat a subsection of the RNA genome reproducibly asso-ciated with a compact cluster of capsid proteins [36]. Ourmodel reproduces this observation for genome moleculeswith small MLD and large N P . The initial assembly isa compact cluster composed of maximally wrapped pen-tamers (see Fig.3, top left).While the model qualitatively reproduces these obser-vations there is an important quantitative diﬀerence. Ascompared with the experiments of ref. [30] on CCMV,the model appears to underestimate the ability of in-creased RNA-to-protein mixing ratios to suppress the as-sembly of complete capsids. While this could be some-thing speciﬁc for CCMV, we believe that this due to thefact that the model underestimates the entropy of a par-tial assembly. This is particularly obvious for ﬂaccid in-termediates that are expected to undergo strong thermalconformational ﬂuctuations that are not represented inthe model. While it may be possible to include confor-mational ﬂuctuations in the model, a simpler route maybe to carry out Brownian Dynamics simulations such asthose of Ref. [39] but then for a system with pentamersthat interact with tree molecules. Another natural ex-tension of the model would be to larger viruses. TheZlotnick Model can be viewed as a representation of thesmallest capsids, known as T = 1 capsids, in which allcapsid proteins have the same local packing organization.The CCMV and MS2 viruses that are an important test-ing ground for physical theories of viral assemblies areT=3 viruses that have a more complex architecture. ACKNOWLEDGMENTS

We would like to thank Alexander Grosberg for draw-ing our attention to spanning trees in the context of virusstructure and assembly. We beneﬁtted from discussionswith Justin Little, Chen Lin, Zach Gvildys and WilliamVong. RB would like to thank the NSF-DMR for contin-ued support under CMMT Grant No.1836404. [1] F. H. C. Crick and J. D. Watson, Nature , 473 (1956).[2] For double-stranded DNA viruses the genome moleculeseither are inserted into the interior pre-fabricated spher-ical capsids by a molecular motor or the genome is pre-condensed prior to assembly. [3] B. K. Ganser-Pornillos, M. Yeager, and W. I. Sundquist,Curr. Opin. Struct. Biol. , 203 (2008).[4] H. Fraenkel-Conrat and R. C. Williams, Proc. Natl.Acad. Sci. U. S. A. , 690 (1955).[5] P. J. G. Butler and A. Klug, Sci. Am. , 62 (1978). [6] A. Klug, Philos. Trans. R. Soc. Lond. B. Biol. Sci. ,531 (1999).[7] For a quantitative treatment of this model, see ref. [ ? ].[8] D. H. Mathews, J. Sabina, M. Zuker, and D. H. Turner,Journal of molecular biology , 911 (1999).[9] L. Tubiana, A. L. Boˇziˇc, C. Micheletti, and R. Podgornik,Biophysical journal , 194 (2015).[10] A. M. Yoﬀe, P. Prinsen, A. Gopal, C. M. Knobler, W. M.Gelbart, and A. Ben-Shaul, Proceedings of the NationalAcademy of Sciences , 16153 (2008).[11] S. Safran, Statistical Thermodynamics of Surfaces, Inter-faces, and Membranes (Addison-Wesley Pub., 1994).[12] P. Ceres and A. Zlotnick, Biochemistry , 11525 (2002).[13] A. Y. Morozov, R. F. Bruinsma, and J. Rudnick, J.Chem. Phys. , 155101 (2009).[14] P. E. Prevelige, D. Thomas, and J. King, Biophys. J. ,824 (1993).[15] G. L. Casini, D. Graham, D. Heine, R. L. Garcea, andD. T. Wu, Virology , 320 (2004).[16] M. Medrano, M. ´A. Fuertes, A. Valbuena, P. J. Carrillo,A. Rodr´ıguez-Huete, and M. G. Mateu, Journal of theAmerican Chemical Society , 15385 (2016).[17] R. Zandi, P. van der Schoot, D. Reguera, W. Kegel, andH. Reiss, Biophys. J. , 1939 (2006).[18] R. Asor, L. Selzer, C. J. Schlicksup, Z. Zhao, A. Zlotnick,and U. Raviv, ACS nano , 7610 (2019).[19] J. D. Bryngelson, J. N. Onuchic, N. D. Socci, and P. G.Wolynes, Proteins: Structure, Function, and Bioinfor-matics , 167 (1995).[20] P. L. Freddolino, A. S. Arkhipov, S. B. Larson,A. McPherson, and K. Schulten, Structure , 437(2006).[21] A. Zlotnick, J. Mol. Biol. , 59 (1994).[22] R. F. Bruinsma, W. M. Gelbart, D. Reguera, J. Rudnick,and R. Zandi, Phys. Rev. Lett. , 248101 (2003).[23] J. Lidmar, L. Mirny, and D. R. Nelson, Phys. Rev. E (2003).[24] R. Zandi, D. Reguera, R. F. Bruinsma, W. M. Gelbart,and J. Rudnick, Proc. Natl. Acad. Sci. U. S. A. ,15556 (2004).[25] A. Zlotnick, J. Mol. Biol. , 14 (2007).[26] D. Rapaport, Phys. Rev. Lett. , 186101 (2008).[27] A. Arkhipov, P. L. Freddolino, and K. Schulten, Struc-ture , 1767 (2006).[28] R. V. Mannige and C. L. Brooks, Proc. Natl. Acad. Sci.U. S. A. , 8531 (2009).[29] R. Kaplan, J. Klobusicky, S. Pandey, D. H. Gracias, andG. Menon, Artif. Life , 409 (2014).[30] M. Comas-Garcia, R. D. Cadena-Nava, A. L. N. Rao,C. M. Knobler, and W. M. Gelbart, Journal of Virology , 12271 (2012).[31] R. F. Garmann, M. Comas-Garcia, A. Gopal, C. M. Kno-bler, and W. M. Gelbart, J. Mol. Biol. , ??? (2013).[32] T. S. Baker, N. H. Olson, and S. D. Fuller, Microbiol.Mol. Biol. Rev. , 862 (1999).[33] M. Tihova, K. A. Dryden, T. V. L. Le, S. C. Harvey, J. E.Johnson, M. Yeager, and A. Schneemann, J. Virol. ,2897 (2004).[34] R. I. Koning, J. Gomez-Blanco, I. Akopjana, J. Vargas,A. Kazaks, K. Tars, J. M. Carazo, and A. J. Koster,Nature communications , 1 (2016).[35] C. Beren, Y. Cui, A. Chakravarty, X. Yang, A. Rao,C. M. Knobler, Z. H. Zhou, and W. M. Gelbart, Pro- ceedings of the National Academy of Sciences , 10673(2020).[36] E. C. Dykeman, N. E. Grayson, K. Toropova, N. A. Ran-son, P. G. Stockley, and R. Twarock, J. Mol. Biol. ,399 (2011).[37] N. Patel, E. C. Dykeman, R. H. A. Coutts, G. P.Lomonossoﬀ, D. J. Rowlands, S. E. V. Phillips, N. Ran-son, R. Twarock, R. Tuma, and P. G. Stockley, Proc.Natl. Acad. Sci. U.S.A. 10.1073/pnas.1420812112 (2015).[38] D. Endres and A. Zlotnick, Biophys. J. , 1217 (2002).[39] J. D. Perlmutter, M. R. Perkett, and M. F. Hagan, J.Mol. Biol. 10.1016/j.jmb.2014.07.004 (2014).[40] J. Rudnick and R. Bruinsma, Physical Review E ,012145 (2019).[41] The assembly of the capsid is assumed here to take placeon a speciﬁc location. The quantity µ reﬂects the en-tropic free energy cost of removing a pentamer from thesolution to this location plus that of any conformationalchange that is required for the pentamer prior to joininga partial capsid.[42] B. Bollob´as, Modern graph theory , Vol. 184 (Springer Sci-ence & Business Media, 2013).[43] R. L. Graham and P. Hell, Annals of the History of Com-puting , 43 (1985).[44] A. L. Loeb, in Space Structures (Springer, 1991) pp. 45–50.[45] The set of unique spanning trees consists of all spanningtrees that, when depicted as in the top left of Fig. 3,cannot be mapped into each other by rotations and/orreﬂections that leave the dodecahedron invariant.[46] L. T. Fang, W. M. Gelbart, and A. Ben-Shaul, The Jour-nal of Chemical Physics , 10B616 (2011).[47] J. Rudnick and R. Bruinsma, Phys. Rev. Lett. , 038101(2005).[48] E. C. Dykeman, P. G. Stockley, and R. Twarock, J. Mol.Biol. , 3235 (2013).[49] A.M.Gutin, A.Y.Grosberg, and E.I.Shakhnovich, Macro-molecules , 1293 (1993).[50] In actuality, condensation of the RNA genome moleculestakes place during encapsidation. It is driven by pos-itively charged polypeptide chains associated with thecapsid proteins.[51] S. Safran, Statistical thermodynamics of surfaces, inter-faces, and membranes (CRC Press, 2018).[52] M. R. Perkett and M. F. Hagan, J. Chem. Phys. ,214101 (2014).[53] N. G. Van Kampen,

Stochastic processes in physics andchemistry , Vol. 1 (Elsevier, 1992).[54] K. Schulten,

Non-equilibrium Statistical Mechanics (UIUC, 1999).[55] D. T. Wu, J. Chem. Phys. , 2644 (1992). Appendix A: Demonstration hat the smallest MLDfor spanning trees on the dodecahedron is nine

We begin by noting that for every vertex on the do-decahedron there is a vertex on the opposite side of thepolyhedron that is a ladder distance ﬁve away. That is,getting from one of the two vertices to the other requirestraversing at least ﬁve edges. Figure 24 shows such apath. For each such pairs of vertices there are 12 mini-

FIG. 24. Two maximally separated vertices on the dodecahe-dron and one of the 12 shortest paths consisting of ﬁve edgesthat join them. mal paths.Now, assume that there is a spanning tree with MLD 8.In such a case, we can pick out a path of ladder distanceeight in that tree. All other elements of the tree willconsist of trees that branch out from that path. Figure 25is a ﬁgurative depiction of the path along with the longestallowed branch sprouting oﬀ each vertex on that path.The likelihood of branching oﬀ those “side branch” pathsis ignored; such branching does not alter the argumentbelow.