[PDF] Coarse-Grained Nucleic Acid-Protein Model for Hybrid Nanotechnology

Abstract

The emerging field of hybrid DNA - protein nanotechnology brings with it the potential for many novel materials which combine the addressability of DNA nanotechnology with versatility of protein interactions. However, the design and computational study of these hybrid structures is difficult due to the system sizes involved. To aid in the design and in silico analysis process, we introduce here a coarse-grained DNA/RNA-protein model that extends the oxDNA/oxRNA models of DNA/RNA with a coarse-grained model of proteins based on an anisotropic network model representation. Fully equipped with analysis scripts and visualization, our model aims to facilitate hybrid nanomaterial design towards eventual experimental realization, as well as enabling study of biological complexes. We further demonstrate its usage by simulating DNA-protein nanocage, DNA wrapped around histones, and a nascent RNA in polymerase.

Full PDF

CCoarse-Grained Nucleic Acid - Protein Model for Hybrid Nanotechnology

Jonah Procyk , Erik Poppleton , and Petr ˇSulc ∗ School of Molecular Sciences and Center for Molecular Design and Biomimetics,The Biodesign Institute, Arizona State University,1001 South McAllister Avenue,Tempe, Arizona 85281, USA (Dated: September 22, 2020)The emerging ﬁeld of hybrid DNA - protein nanotechnology brings with it the potential for manynovel materials which combine the addressability of DNA nanotechnology with versatility of proteininteractions. However, the design and computational study of these hybrid structures is diﬃcult dueto the system sizes involved. To aid in the design and in silico analysis process, we introduce herea coarse-grained DNA/RNA-protein model that extends the oxDNA/oxRNA models of DNA/RNAwith a coarse-grained model of proteins based on an anisotropic network model representation. Fullyequipped with analysis scripts and visualization, our model aims to facilitate hybrid nanomaterialdesign towards eventual experimental realization, as well as enabling study of biological complexes.We further demonstrate its usage by simulating DNA-protein nanocage, DNA wrapped aroundhistones, and a nascent RNA in polymerase.

INTRODUCTION

Molecular nanotechnology designs biomolecular inter-actions to assemble nanoscale devices and structures.DNA nanotechnology, in particular, has attracted lotsof attention and experienced rapid growth over the pastthree decades. While originally envisioned as a methodof developing a DNA lattice for crystallizing proteins forstructure determination [1], DNA nanotechnology is see-ing promising applications in e.g. biomaterial assembly[2], biocatalysis [3], therapeutics [4], and diagnostics [5].The programmability of DNA allows for the rapid designand experimental realization of complex shapes, yield-ing an unprecedented level of control and functionalityat the nanoscale. As DNA nanotechology has devel-oped, so have parallel technologies with other familiarbiomolecules such as RNA [6], and, to some extent, pro-teins [7, 8]. While DNA nanostructures and devices havebeen unequivocally successful in realizing more complexand larger constructs, they are inherently limited in func-tion by their available chemistry, with a possible solu-tion using functionalized DNA nanostructures [9]. Ofparticular interest is hybrid DNA-protein nanotechnol-ogy, which can combine the already well developed de-sign strategies of DNA nanotechnology and cross-linkingthem with functional proteins. The combination of thetwo molecules in nanotechnology will open new applica-tions, such as diganostics, therapeutics, molecular ”fac-tories” and new biomimetic materials [10]. Examples ofsuccessfully realized hybrid nanostructures include DNA-protein cages [11], a DNA nanorobot with nucleolin ap-tamer for cancer therapy [12] and peptide-directed as-sembly of large nanostructures [13].At the same time, computational tools for the studyand design of DNA and RNA nanostructures have be- ∗ Corresponding author: [email protected] come increasingly relevant as size and complexity ofnanostructures grow. Design tools such as Adenita [14]MagicDNA [15], CadNano [16], and Tiamat [17] are es-sential for the structural design of DNA origamis. Newcoarse-grained models have been introduced to studyDNA nanostructures, as the sizes (thousands or more)as well as rare events (formation or breaking of largesections of base pairs) involved in study of these sys-tems make atomistic-resolution modeling more diﬃcult.Among the available tools, the oxDNA and oxRNAsmodels [18–21] have been among the most popular overthe past few years and have been used by dozens of re-search groups in over one hundred articles to study vari-ous aspects of DNA and RNA nanosystems as well as bio-physical properties of DNA and RNA [22–27]. Each nu-cleotide is represented as a rigid body in the simulation,with interactions between diﬀerent sites parametrizedto reproduce mechanical, structural and thermodynamicproperties of single-stranded and double-stranded DNAand RNA respectively.However, the oxDNA/oxRNA models only allow forrepresentation of nucleic acids alone, limiting their scopeof usability. While there have been examples of coarse-grained protein-DNA modeling speciﬁcally applied tomodeling of nucleosomes [28, 29], we currently do nothave an eﬃcient tool at the level of oxDNA coarse-graining that would allow for eﬃcient study of arbitraryprotein-DNA complexes.Here, we introduce such a coarse-grained model thatuses an Anisotropic Network Model (ANM) to representproteins alongside the oxDNA or oxRNA model. TheANM is a form of elastic network model used to probe thedynamics of biomolecules ﬂuctuating around their nativestate. Originally formulated by Atilgan et. al. [30], theANM has become fundamental tool in probing proteindynamics, often closely matching residue-residue ﬂuctu-ations and normal modes of fully atomistic simulations[31–33]. Here we use the ANM to approximately capture a r X i v : . [ q - b i o . B M ] S e p native state protein dynamics. The ANM representationof proteins interact with just an excluded volume interac-tion with the oxDNA / oxRNA representation, but spe-ciﬁc attractive or repulsive interactions can be added aswell. We provide parametrization of common linkers thatare used to conjugate proteins to DNA in typical hybridnanotechnology applications.The ANM-oxDNA/oxRNA hybrid models are intendedto help design and probe function of large nucleic-acidprotein hybrid nanostructures, but also to be used tostudy biological complexes and processes which can becaptured within the approximations employed by themodels. As an example of the model’s use, we show simu-lations of DNA-protein hybrid nanocage, DNA wrappedaround histone, and a nascent RNA strand inside poly-merase. FIG. 1. A schematic overview of the oxDNA2 model and itsinteractions. Each nucleotide is represented as a single rigidbody with backbone and base interaction sites (shown hereschematically as a sphere and an ellipsoid) with their eﬀectiveinteractions designed to reproduce basic properties of DNA.

MODEL DESCRIPTION

Implemented in the oxDNA simulation package [34],our model allows for a coarse-grained simulation of largehybrid nanostructures. It consists of two coarse-grainedparticle representations, the already existing oxDNA2 oroxRNA model for their respective nucleic acids and anAnistropic Network Model (ANM) for proteins [35]. Thedetailed description of the oxDNA2/oxRNA models isavailable in Refs. [19, 20]. A DNA duplex with a nickedstrand is schematically illustrated in Fig. 1. The ANMallows us to represent a protein with a known structureas beads connected by springs. We chose to use ANM torepresent proteins for its eﬃciency and relative simplicity,while still providing reasonably accurate representationsof proteins crosslinked to DNA nanostructures. Further-more, it can be implemented using only pairwise interac-tion potentials, the same as oxDNA/oxRNA models.

TABLE I. Excluded volume parameters used in Eq. 2 for(a) protein-protein, (b) protein-nucleic base and (c) protein-nucleic backbone non-bonded interactions in simulation units.Parameter (a) (b) (c) σ .

350 0 .

360 0 . r c .

353 0 .

363 0 . r ∗ .

349 0 .

359 0 . b . × . × . × Protein Model

In the ANM representation, each protein residue isrepresented solely by its α -carbon position. All residueswithin a speciﬁed cutoﬀ distance r max from one anotherare considered ’bonded’. Please see Ref. [30] for a moredetailed introduction. Each bond between residues i and j in the ANM is represented as a harmonic potential thatﬂuctuates around the equilibrium length r ij : V ij (cid:0) r ij (cid:1) = 12 γ (cid:16) r ij − r ij (cid:17) (1)The total bonded interaction potential V bonded − anm is thesum of terms Eq. (1) for all pairs i, j of aminoacids atdistance smaller than r max in the resolved protein struc-ture, as schematically illustrated in Fig. 2. We set r ij to the the distance between α -carbons of the residues i and j in the PDB ﬁle. Free parameter γ is set uniformlyon each bond in the ANM and and is chosen to bestﬁt the Debye-Waller factors of the original PDB struc-ture. Debye-Waller factors (or B-factors when appliedspeciﬁcally to proteins) describe the thermal motions ofeach resolved atom in a protein given by their respectiveX-ray scattering assay. As previously done [30], we usethe B-factor of the α -carbon to approximately capturethe ﬂuctuations of the protein backbone. Since an ANMis typically an analytical technique, it has no excludedvolume eﬀects. Hence we here extend the model to usea repulsive part Lennard-Jones potential between bothbonded and non-bonded particles (Eq. 2) to model theexcluded volume at a per particle excluded volume diam-eter of 2 . r , we deﬁne the ex-cluded volume interaction in Eq. 2: V exc ( r ) =  (cid:15) ( − σ r + σ r ) r < r ∗ b(cid:15) ( r − r c ) r ∗ < r < r c r ≥ r c (2)where we choose r c . Parameters b and r ∗ were calculatedso that V exc is a diﬀerentiable function. The constant (cid:15) sets the strength of the potential and we use (cid:15) = 82 pNnm . Parameterization

In parameterizing our model for simulation, the goalis to mimic the dynamics of the protein in the nativestate. Though not without their setbacks [36, 37] we se-lected B-factors for their widespread availability in PDBstructures and history of being used to ﬁt elastic net-work models of proteins [36]. Our model contains twofree parameters, the cutoﬀ distance r max and the springconstant γ . FIG. 2. Illustration of ANM using GFP protein (PDB code:1W7S) from (a) starting PDB structure to (b) ANM represen-tation at r max of 8 ˚A, (c) bonding criteria per residue: all par-ticles within distance r max (bounds depicted by blue sphere)of center particle (black circle) are considered ’bonded’ (bluesquares) while those further (outside of sphere) are considered’nonbonded’ (red squares). The cutoﬀ distance r max can be varied, but typicallyauthors use a value of 13 ˚A [30] with strong long-rangeinteractions being a key feature of the classic ANM [37].For each protein (consisting of N aminoacids) repre-sented by ANM, we linearly ﬁt the analytically computedB-factors to their experimental counterpart with γ as afree parameter. To solve for the B-factors analytically,we ﬁrst calculate the 3 N × N Hessian matrix of thespring potential V spring , a task made simple by the har-monic potential energy function [30]. After constructingthe Hessian H for the system at a speciﬁed cutoﬀ r max ,the mean squared deviation from the mean position foreach residue i can be calculated from the equipartitiontheorem: (cid:10) ∆ R i (cid:11) = k b Tγ (cid:0) tr (cid:0) H − i,i (cid:1)(cid:1) (3)The B-factor B of the residue i can be directly computedfrom our previous result as [30]: B i = 8 π (cid:104) ∆ R i (cid:105) . (4)The experimental B-factors are provided along with re-solved crystal structures of proteins, and we can henceuse Eqs. (3) and (4) to obtain N equations. We then ﬁt γ parameter to minimize f ( γ ) = N (cid:88) i =1 (cid:18) B exp .i − π (cid:104) ∆ R i (cid:105) (cid:19) (5)for a selected r max . We can further measure the meansquare deviation of residue positions in a simulation of our model and compare to the analytical calculation.We show the comparison in Fig. 3 for ribonuclease T1and green ﬂuoresecent proteins simulated with the ANMmodel and our ANMT model, to be introduced later.While the simulation and analytical prediction of theclassic ANM agree well with each other, as expected,we note that the model still does not fully reproducethe measured B-factors as reported in the experimen-tal structures. ANM models are not able to fully re-produce the measured B-factors [30], and are known tohave peaks in the mean square displacement proﬁles thathave not been observed in the measured B-factors [36].The model nevertheless provides semi-quantitative agree-ment with the measured data, and hence represents anaccurate enough representation of a protein to model itsmechanical properties under small perturbations, as re-quired for DNA-hybrid nanotechnology systems. EXPANSION OF THE ANM MODEL

In addition to the classic ANM model, our model canalso optionally use unique γ ij for each bonded pair ofresidues, which allows for implementation of other ana-lytical models, such as the heterogeneous ANM (HANM)[38] and multiscale ANM (mANM) [39] that can gener-ate better ﬁts to experimental B-factors using the γ ij values. The HANM iteratively ﬁts a normal ANM net-work to given experimental B-factors with variable real-istic force parameters γ ij . While unquestionably useful,the inaccuracy of B-factor data particularly in large orhigh resolution structures limits its application. In themANM model, our conversion from the PDB structureto ANM representation also allows the ﬁtting of multiplenetworks with varying γ ij values tuned by scale param-eters [39] (similar to r max ). A linear combination of thenetworks is then solved to minimize the diﬀerence be-tween the ANM network’s predicted and experimentalB-factors. The original formulation of the mANM [39]is limited in computational application as it has no cut-oﬀ value ( r max ); a protein of size N residues would have N ( N − / r max and thesame γ for all spring interactions. A C α coarse-grainedHANM and a mANM with an additional cutoﬀ valueparameter are however implemented in our conversionscripts and can be optionally used to represent proteinsin our model.One major obstacle in using an ANM is known as thetip eﬀect [40]. The result is an extremely large spike inthe B-factors due to a residue being under-constrained.Often this can be solved by raising the cutoﬀ value inANM construction; however, doing so raises the compu-tational requirements of our simulations. Furthermore,we found the ANM model to be not able to accuratelyrepresent short peptides, as the spring network does not (a) (b) FIG. 3. Analytical, classic ANM simulation, ANMT simula-tion, and experimentally determined B-factors calculated in˚A per residue for (a) ribonuclease T1 (PDB code 1BU4) at25 ◦ C ( r max = 15 , k s = 42 . pN/ ˚A , k b = k t = 171 . pN/ ˚A)and (b) green ﬂuorescent protein (PDB code 1W7S) at 25 ◦ C ( r max = 13 , k s = 33 . pN/ ˚A , k b = k t = 171 . pN/ ˚A) provide enough constraints to reproduce their end-to-enddistance as seen when simulated with more detailed mod-els like AWSEM-MD [41].To overcome this obstacle, we implemented harmonicpairwise bending and torsional modulation forces intothe existing simulation model. These new constraintsallow for reduced r max values, and also can more accu-rately represent shorter peptides, which are often usedin DNA-hybrid nanostructures. We introduce these op-tional modulation forces below. Bending and Torsional Modulation

We introduce the torsional and bending potential asoptional interaction potentials in our protein representa-

FIG. 4. Depiction of (a) bending and (b) torsional potentialterms on a pair of particles i and j . The angles depicted asdot products correspond to the cosine of that angle. Equilib-rium values (in red) correspond to (the cosine of) initial angledisplacements derived from coordinates in the PDB ﬁle. tion on top of the ANM model with bonded and excludedvolume potentials. Each protein residue corresponds toa spherical particle, with associated orientation given byits orthonormal axes ˆ i , ˆ i , ˆ i (Fig. 4a). Harmonic termscontrol the angle between the normalized interparticledistance vector ˆ r ij and the normal vector of each particleˆ i , ˆ j to control bond bending. The angles between twosets of orientation vectors, ˆ i , ˆ j and ˆ i , ˆ j , are controlledas well allowing for modulation of the torsion based onthe particles relative orientations. The full pairwise po-tential is given by Eq. 6: V B & Tij = k b (cid:18)(cid:16) ˆ r ij · ˆ i − a ij (cid:17) + (cid:16) − ˆ r ij · ˆ j − b ij (cid:17) (cid:19) + k t (cid:18)(cid:16) ˆ i · ˆ j − c ij (cid:17) + (cid:16) ˆ i · ˆ j − d ij (cid:17) (cid:19) (6)The function V B & Tij is deﬁned for all pairs of residuesthat are neighbors along the protein backbone. We setthe energy minimum values a ij , b ij , c ij , d ij to correspondto the cosines of respective angles in between residuesin the PDB ﬁle for the protein structure. The terms k b and k t are two new global parameters that controlthe strength of the bending and torsion potential respec-tively. Currently, we set their values empirically, thoughpair speciﬁc terms could lead to further agreement withexperimental data. Fig. 3 shows the eﬀect of the torsionaland bonding modulation on the same set of proteins usedprior. As intended, a noticeable decrease in high peak B-factors is observed using a modest k b and k t value. Fig. 4illustrates the potential in a two particle system. Here-after, we will refer to the ANM model with torsional andbending modulation as the ANMT model. Protein-Nucleic Acid Interactions

In our current implementation of the model, proteinresidues and nucleotides have no interaction except forexcluded volume and optional explicitly speciﬁed springpotentials between user-designated protein residues andnucleotides: V spring ( r ) = k ( r − r ) (7) FIG. 5. 2D molecular structures of common bioconjugatelinkers dubbed (a) LC-SPDP and (b) DBCO-triazole; bothcan be used to conjugate proteins to amine-modiﬁed nu-cleotides where r is the distance between the centers of mass of therespective particles and k and r and external parame-ters.The excluded volume interaction potential betweenprotein and DNA/RNA residues has the same form asdeﬁned in Eq. (2), with the respective interaction pa-rameters given in Table I. In the oxDNA/oxRNA models,each nucleotide has two distinct interaction sites (back-bone and base), each of which is interacting with theprotein residue using separate excluded volume parame-ters. Future expansion of the model will include an ap-proximate treatment of electrostatic interaction betweenprotein and nucleic acids based on Debye-H¨uckel theoryas implemented in oxDNA [19], as well as coarse-grainedprotein model AWSEM [41]. Many non-speciﬁc DNA-protein interactions make use of the electrostatic interac-tions between the DNA backbone and positively chargedportions of the protein [42]. Sensitive to salt concentra-tion, these electrostatic contributions have been previ-ously modeled using Debye-H¨uckel theory [43] to inves-tigate the role of protein frustration in regulating DNAbinding kinetics. Similarly an extension of our modelwith an appropriate Debye-H¨uckel potential can captureand enable study of non-speciﬁc DNA-binding proteinsystems.Since we are interested in exploring conjugated hybridsystems, it is necessary to have an approximation forthe covalent linkers bridging the nucleic acid base andprotein residue. We model the two bioconjugate linkers,LC-SPDP and DBCO-triazole, (Fig. 5) that are typicallyused in protein-DNA hybrid nanotechnology [44, 45] us-ing a spring potential as deﬁned in Eq. (7) with parame-ters k and r parametrized to mimic the end-to-end aver-age distance and standard deviation of each linker at tem-perature 300K. LC-SPDP links the thiol group of a mod-iﬁed cysteine residue to an amine-modiﬁed nucleotide.DBCO-trizaole is the product of a copper-free click re-action involving a DBCO-modiﬁed residue to link to anazide-modiﬁed nucleotide. Each of the linkers (Fig. 5)was ﬁrst drawn in MolView and then converted intoOPLS/AA forceﬁeld format via LibParGen [46–48]. InGROMACS [49], each linker was ﬁrst equilibrated andthen simulated with OPLS/AA forceﬁeld in SPCE wa-ter molecules at 300K for 3 trials of 1 nanosecond each.The obtained averaged end-to-end distance and standarddeviation during each trial are shown in Table II. TABLE II. Average and standard deviation of end-to-enddistance of linkers in fully atomistic Gromacs simulation andﬁt spring constant k Linker (cid:104) r (cid:105) (˚A) (cid:104) r (cid:105) (˚A) k ( pN/ ˚A)LC-SPDP 1.71 0.32 3.99DBCO-triazole 3.64 2.87 0.14 EXAMPLES

Our model is fully functional with the latest version ofthe visualization tool oxView [50] for both the design ofhybrid nanomaterials as well as the viewing of simulationtrajectories. The one caveat is that protein topologiesare non-editable. Instead each protein starts from theirPDB crystal structure and is converted into oxDNA for-mat while the ANM spring constant is set to best matchthe experimental B-factors via our provided scripts. Theoutput ﬁles can then be loaded into oxView as well asused for simulation in our model.The model is theoretically able to represent any pro-tein or protein complex that the ANM model can rep-resent. Not beyond the scope of our model, biologicallyrelevant multi-chain proteins such as nucleosomes, RNApolymerases, and viral assemblies can be also simulated,allowing for the nucleic acid behavior present in each ofthese systems to be modeled, studied, and compared toexperimental data. While the detailed study of these sys-tems is beyond the scope of this article, we show exam-ples of both biological systems and designed nanosystemsas represented by our ANM-oxDNA or ANM-oxRNAmodel.Two prominent cases of nucleic acid - protein inter-actions, RNA polymerases and nucleosomes, were con-structed and simulated using the ANMT model for fu-ture study. As many PDB ﬁles are missing residues, weﬁrst reconstruct each individual chain using the best scor-ing of ten models generated by the Modeller tool [51].The reconstructed RNA polymerase was converted intooxDNA format from its PDB entry (6ASX) using an r max of 15 ˚A. A fragment of the RNA was reinserted into theexit channel and the subsequent MD simulation was al-lowed to sample the RNA’s escape from the exit chan-nel. The reconstructed nucleosome was converted intooxDNA simulation format from its PDB entry (3LEL)using an r max of 12 ˚A. Attractive spring potentials be-tween randomly chosen DNA and protein residues thatwere in close proximity in the PDB structure were addedat the histone/DNA interface. Example snapshots fromthe MD simulations of these simulated biological systemsare shown in Fig. 6a,b.While no process was explicitly modeled, our newmodel can be used to explore behavior of large scale sim-ulation of DNA and histones, as at the latest version ofGPU cards, the oxDNA model has been shown to beable to equilibrate systems consisting of over 1 millionnucleotides.More pertinent to our goal of aiding in the design of (a) (b) FIG. 6. OxView visualization of simulated biological assemblies (a) RNA in exit channel of paused RNA polymerase (PDBcode: 6ASX) and (b) Human nucleosome made up of histone octamer and DNA (PDB code: 3LEL), (c) mean structure fromMD simulation of KDPG aldolase (PDB code: 1WA3) conjugated to a DNA cage hybrid nanostructures, our model supports conversion ofCadNano, Tiamat, and other popular DNA origami de-sign tools into the oxDNA format [52] where they caneasily be edited in oxView to include linked proteins ofinterest. Since an ANM is a highly simpliﬁed model ofprotein dynamics, the predictive power of our model liesnot in prediction of protein structure but rather the col-lection of statistical data of the protein’s eﬀect on thenucleic acid component of the system. Available andcompatible with this model is also the suite of oxDNAanalysis scripts [50] allowing for a detailed exploration ofsystem speciﬁc eﬀects.Synthetic peptides are used in many chemistry applica-tions. Since these peptides are often very small and lacklong-distance contacts that enforce speciﬁc 3D confor-mations, we wanted to explore how our models performon these small structures. We compared the end-to-enddistance of 3 hemagglutinin binding peptides [53] simu-lated in our ANM model, the ANMT model, and anotherpopular coarse-grained protein model, AWSEM-MD [54].For AWSEM-MD simulations, initial structure predic-tions were generated from sequence using I-TASSER [55].A secondary structure weight (ssweight) ﬁle was gener-ated using jpred [56], and the structure and weight ﬁleswere converted to the appropriate formats for AWSEM-MD simulation in LAMMPS [57] using tools providedwith AWSEM-MD. Simulations were run for 10 stepswith end-to-end distance printed every 10 steps.Using the classic ANM, each peptide was built us-ing strong backbone connections and signiﬁcantly weakerlong-range connections to empirically match the AWSEMmean and standard deviation of the end-to-end dis-tance. The resulting simulation of each peptide; how-ever, showed the trajectory to include a large amount ofstretched, nonphysical conformations. The subsequentinclusion of the bending and torsion modulation using theANMT model allowed for the same level of accuracy usingonly strong short-range connections. The ANMT model showed much higher rigidity with no stretched confor-mations when compared to the ANM model alone. Finalend-to-end distances and standard deviation are shownin Table III. TABLE III. Average and standard deviation of end-to-enddistance of hemagglutinin peptides between coarse-grainedmodelsModel AWSEM ANM ANMT

Peptide 125 (cid:104) r (cid:105) (˚A) 12.02 12.9 12.09 (cid:104) r (cid:105) (˚A) 4.9 4.51 4.34 Peptide 149 (cid:104) r (cid:105) (˚A) 12.9 12.9 12.9 (cid:104) r (cid:105) (˚A) 6.6 4.6 4.6 Peptide 227 (cid:104) r (cid:105) (˚A) 14.5 16.2 14.7 (cid:104) r (cid:105) (˚A) 7.4 5.4 5.1 Peptide 125 - CSGHNIYAQYGYPYDHMYEGPeptide 149 - CSGKSQEIGDPDDIWNQMKWPeptide 227 - CSGSGNQEYFPYPMIDYLKK

Hybrid DNA-protein nanostructure constructs such asthose developed by the Stepahanopoulos Lab are of par-ticular interest. The Stephanopoulos group has experi-mentally realized their size-tunable DNA cage attachedto homotrimeric protein KDPG aldolase making use ofa LC-SPDP linker (Fig. 5) to join the DNA and proteincomponents [11]. The DNA cage was converted from Tia-mat format into oxDNA format and the protein was con-verted from it’s PDB structure. The linker between thecomponents was modeled as a spring potential (Eq. (7))using the parameters from Table II. We conducted a shortMD simulation of the full system corresponding to timeof about 30 ns. The mean structure from simulation ofthe experimental cage was calculated using our analysisscripts [50] and is displayed in Fig. 6c.

CONCLUSIONS

We present a coarse-grained protein model, based onelastic network representation of proteins, for use in con-junction with existing coarse-grained nucleic acid modelscapable of simulating large hybrid nanostructures. Im-plemented on GPU as well as CPU, our model allows forsimulations of large systems based on nanotechnology de-signs as well as large biological complexes.Looking forward, both the paused RNA polymeraseand histone are biological systems we plan to study us-ing this model. In addition, experimental systems suchas the hybrid cage in Fig. 6 can be simulated and directlycompared to available experimental data. While widelyavailable, B-factors are severely limited particularly interms of accuracy. However, our model can be parame-terized to approximate any available ﬂuctuation data in-cluding but not limited to fully atomistic simulation andsolution NMR data. In addition to the model, we alsoextended a nanotechnology design and simulation anal-ysis tool, oxView, to include a protein representation toaid computer design of DNA/RNA-protein hybrid nanos- tructures. The subsequent analysis of the designs canbe used to optimize nanostructure parameters, such asplacement of the linkers and lengths of duplex segmentsin order to achieve desired geometry.The simulation code is freely available on github.com/sulcgroup/anm-oxdna and will also be incorporated inthe future release of the oxDNA simulation package.The visualization of protein-hybrid systems has been in-corporated into our previously developed oxView tool[50]. The aforementioned analysis scripts and visualizerare available in git repositories github.com/sulcgroup/oxdna_analysis_tools and github.com/sulcgroup/oxdna-viewer respectively.

ACKNOWLEDGEMENTS

We thank all members of the ˇSulc group for their sup-port and helpful discussions, in particular to H. Liu andM. Matthies. We thank Dr. Stephanopoulos for helpfulcomments and feedback about simulation study of hisDNA-protein hybrid system. We acknowledge supportfrom the NSF grant no. 1931487. [1] N. C. Seeman, Journal of Theoretical Biology , 237(1982).[2] W. Liu, M. Tagawa, H. L. Xin, T. Wang, H. Emamy,H. Li, K. G. Yager, F. W. Starr, A. V. Tkachenko, andO. Gang, Science , 582 (2016).[3] C. Geng and P. J. Paukstelis, Journal of the AmericanChemical Society , 7817 (2014).[4] S. Li, Q. Jiang, S. Liu, Y. Zhang, Y. Tian, C. Song,J. Wang, Y. Zou, G. J. Anderson, J.-Y. Han, et al. , Na-ture biotechnology , 258 (2018).[5] F. Zhang, J. Nangreave, Y. Liu, and H. Yan, Journal ofthe American Chemical Society , 11198 (2014).[6] P. Guo, Nature nanotechnology , 833 (2010).[7] R. V. Ulijn and R. Jerala, Chemical Society Reviews ,3391 (2018).[8] N. P. King, J. B. Bale, W. Sheﬄer, D. E. McNamara,S. Gonen, T. Gonen, T. O. Yeates, and D. Baker, Nature , 103 (2014).[9] M. Madsen and K. V. Gothelf, Chemical reviews ,6384 (2019).[10] N. Stephanopoulos, Chem , 364 (2020).[11] Y. Xu, S. Jiang, C. R. Simmons, R. P. Narayanan,F. Zhang, A. M. Aziz, H. Yan, and N. Stephanopou-los, ACS Nano , 3545 (2019).[12] S. Li, Q. Jiang, S. Liu, Y. Zhang, Y. Tian, C. Song,J. Wang, Y. Zou, G. J. Anderson, J. Y. Han, Y. Chang,Y. Liu, C. Zhang, L. Chen, G. Zhou, G. Nie, H. Yan,B. Ding, and Y. Zhao, Nature Biotechnology , 258(2018).[13] J. Jin, E. G. Baker, C. W. Wood, J. Bath, D. N. Woolf-son, and A. J. Turberﬁeld, ACS nano , 9927 (2019).[14] E. deLlano, H. Miao, Y. Ahmadi, A. J. Wilson, M. Beeby,I. Viola, and I. Barisic, Nucleic Acids Research (2020),10.1093/nar/gkaa593. [15] C. M. Huang, A. Kucinic, J. A. Johnson, H. Su, andC. E. Castro, bioRxiv (2020).[16] S. M. Douglas, A. H. Marblestone, S. Teerapittayanon,A. Vazquez, G. M. Church, and W. M. Shih, NucleicAcids Research , 5001 (2009).[17] S. Williams, K. Lund, C. Lin, P. Wonka, S. Lindsay, andH. Yan, in International Workshop on DNA-Based Com-puters (Springer, 2008) pp. 90–101.[18] T. E. Ouldridge, A. A. Louis, and J. P. Doye, The Jour-nal of chemical physics , 02B627 (2011).[19] B. E. Snodin, F. Randisi, M. Mosayebi, P. ˇSulc, J. S.Schreck, F. Romano, T. E. Ouldridge, R. Tsukanov,E. Nir, A. A. Louis, et al. , The Journal of chemicalphysics , 06B613 1 (2015).[20] P. ˇSulc, F. Romano, T. E. Ouldridge, J. P. Doye, andA. A. Louis, The Journal of chemical physics , 235102(2014).[21] P. ˇSulc, F. Romano, T. E. Ouldridge, L. Rovigatti,J. P. K. Doye, and A. A. Louis, Journal of ChemicalPhysics , 5101 (2012).[22] R. Sharma, J. S. Schreck, F. Romano, A. A. Louis, andJ. P. Doye, ACS Nano , 12426 (2017).[23] M. C. Engel, D. M. Smith, M. A. Jobst, M. Sajfutdinow,T. Liedl, F. Romano, L. Rovigatti, A. A. Louis, and J. P.Doye, ACS Nano , 6734 (2018).[24] A. Suma, E. Poppleton, M. Matthies, P. ˇSulc, F. Ro-mano, A. A. Louis, J. P. Doye, C. Micheletti, andL. Rovigatti, Journal of Computational Chemistry ,2586 (2019).[25] F. Hong, S. Jiang, X. Lan, R. P. Narayanan, P. ˇSulc,F. Zhang, Y. Liu, and H. Yan, Journal of the AmericanChemical Society , 14670 (2018).[26] J. P. Doye, T. E. Ouldridge, A. A. Louis, F. Romano,P. ˇSulc, C. Matek, B. E. Snodin, L. Rovigatti, J. S. Schreck, R. M. Harrison, and W. P. Smith, PhysicalChemistry Chemical Physics , 20395 (2013).[27] M. Matthies, N. P. Agarwal, E. Poppleton, F. M. Joshi,P. ˇSulc, and T. L. Schmidt, ACS nano , 1839 (2019).[28] B. Zhang, W. Zheng, G. A. Papoian, and P. G. Wolynes,Journal of the American Chemical Society , 8126(2016).[29] R. V. Honorato, J. Roel-Touris, and A. M. Bonvin, Fron-tiers in molecular biosciences , 102 (2019).[30] A. R. Atilgan, S. R. Durell, R. L. Jernigan, M. C.Demirel, O. Keskin, and I. Bahar, Biophysical Journal , 505 (2001).[31] S. K. Mishra and R. L. Jernigan, PLoS ONE (2018),10.1371/journal.pone.0199225.[32] M. Gur, E. Zomot, and I. Bahar, Journal of ChemicalPhysics , 121912 (2013).[33] L. Yang, G. Song, A. Carriquiry, and R. L. Jernigan,Structure , 321 (2008).[34] L. Rovigatti, P. ˇSulc, I. Z. Reguly, and F. Romano, Jour-nal of computational chemistry , 1 (2015).[35] A. R. Atilgan, S. Durell, R. L. Jernigan, M. Demirel,O. Keskin, and I. Bahar, Biophysical journal , 505(2001).[36] Z. Sun, Q. Liu, G. Qu, Y. Feng, and M. T. Reetz, Chem-ical Reviews (2019), 10.1021/acs.chemrev.8b00290.[37] E. Fuglebakk, N. Reuter, and K. Hinsen, Journal ofChemical Theory and Computation , 5618 (2013).[38] F. Xia, D. Tong, and L. Lu, Journal of Chemical Theoryand Computation , 3704 (2013).[39] K. Xia, Physical Chemistry Chemical Physics , 658(2017).[40] M. Lu, B. Poon, and J. Ma, Journal of Chemical Theoryand Computation , 464 (2006).[41] M. Y. Tsai, W. Zheng, D. Balamurugan, N. P. Schafer,B. L. Kim, M. S. Cheung, and P. G. Wolynes, ProteinScience , 255 (2016).[42] V. K. Misra, J. L. Hecht, A. S. Yang, and B. Honig, Biophysical Journal , Tech. Rep. 5 (1998).[43] A. Marcovitz and Y. Levy, Journal of Physical ChemistryB , 13005 (2013). [44] A. Buchberger, C. R. Simmons, N. E. Fahmi, R. Free-man, and N. Stephanopoulos, Journal of the AmericanChemical Society , 1406 (2019).[45] Y. Xu, S. Jiang, C. R. Simmons, R. P. Narayanan,F. Zhang, A.-M. Aziz, H. Yan, and N. Stephanopoulos,ACS nano (2019).[46] L. S. Dodda, I. C. De Vaca, J. Tirado-Rives, and W. L.Jorgensen, Nucleic Acids Research , W331 (2017).[47] L. S. Dodda, J. Z. Vilseck, J. Tirado-Rives, and W. L.Jorgensen, Journal of Physical Chemistry B , 3864(2017).[48] W. L. Jorgensen and J. Tirado-Rives, Proceedings of theNational Academy of Sciences of the United States ofAmerica , Tech. Rep. 19 (2005).[49] H. J. Berendsen, D. van der Spoel, and R. vanDrunen,

Computer Physics Communications , Tech. Rep.1-3 (1995).[50] E. Poppleton, J. Bohlin, M. Matthies, S. Sharma,F. Zhang, and P. ˇSulc, Nucleic acids research , e72(2020).[51] A. ˇSali and T. L. Blundell, Journal of Molecular Biology(1993), 10.1006/jmbi.1993.1626.[52] A. Suma, E. Poppleton, M. Matthies, P. ulc, F. Romano,A. A. Louis, J. P. K. Doye, C. Micheletti, and L. Rovi-gatti, J. Comput. Chem. , 1 (2019).[53] S. A. Johnston, V. Domenyuk, N. Gupta, M. T. Batista,J. C. Lainson, Z.-G. Zhao, J. F. Lusk, A. Loskutov, Z. Ci-chacz, P. Staﬀord, et al. , Scientiﬁc reports , 1 (2017).[54] A. Davtyan, N. P. Schafer, W. Zheng, C. Clementi, P. G.Wolynes, and G. A. Papoian, The Journal of PhysicalChemistry B , 8494 (2012).[55] J. Yang and Y. Zhang, Nucleic acids research , W174(2015).[56] A. Drozdetskiy, C. Cole, J. Procter, and G. J. Barton,Nucleic acids research , W389 (2015).[57] S. Plimpton, Journal of computational physics117