Analyses of protein cores reveal fundamental differences between solution and crystal structures
Zhe Mei, John D. Treado, Alex T. Grigas, Zachary A. Levine, Lynne Regan, Corey S. O'Hern
AAnalyses of protein cores reveal fundamental differences betweensolution and crystal structures
Zhe Mei , John D. Treado , Alex T. Grigas , Zachary A. Levine , Lynne Regan , andCorey S. O’Hern Integrated Graduate Program in Physical & Engineering Biology, Yale University, NewHaven, Connecticut 06520, USA Department of Chemistry, Yale University, New Haven, Connecticut 06520, USA Department of Mechanical Engineering & Materials Science, Yale University, New Haven,Connecticut 06520, USA Graduate Program in Computational Biology & Bioinformatics, Yale University, NewHaven, Connecticut 06520 USA Department of Pathology, Yale University, New Haven, Connecticut 06520, USA Department of Molecular Biophysics and Biochemistry, Yale University, New Haven,Connecticut, 06520 Institute of Quantitative Biology, Biochemistry and Biotechnology, Center for Syntheticand Systems Biology, School of Biological Sciences, University of Edinburgh Department of Physics, Yale University, New Haven, Connecticut 06520, USA Department of Applied Physics, Yale University, New Haven, Connecticut 06520, USAJuly 22, 2019
Abstract
There have been several studies suggesting that protein structures solved by NMR spec-troscopy and x-ray crystallography show significant differences. To understand the origin ofthese differences, we assembled a database of high-quality protein structures solved by bothmethods. We also find significant differences between NMR and crystal structures—in the root-mean-square deviations of the C α atomic positions, identities of core amino acids, backbone andsidechain dihedral angles, and packing fraction of core residues. In contrast to prior studies,we identify the physical basis for these differences by modelling protein cores as jammed pack-ings of amino-acid-shaped particles. We find that we can tune the jammed packing fraction byvarying the degree of thermalization used to generate the packings. For an athermal protocol,we find that the average jammed packing fraction is identical to that observed in the coresof protein structures solved by x-ray crystallography. In contrast, highly thermalized packing-generation protocols yield jammed packing fractions that are even higher than those observed inNMR structures. These results indicate that thermalized systems can pack more densely thanathermal systems, which suggests a physical basis for the structural differences between proteinstructures solved by NMR and x-ray crystallography. It is generally acknowledged that protein structures determined by x-ray crystallography versusNMR exhibit small but significant differences. It is by no means resolved, however, whether thesedifferences stem from differences in the experimental methods themselves, or if they reflect physicaldifferences in proteins under the different conditions in which the measurements are made [1, 3,1 a r X i v : . [ q - b i o . B M ] J u l (xtal) ∆(NMR)∆(pair) (a) (b) Figure 1: (a) Probability distributions P (∆( i, j )) of the root-mean-square deviations (RMSD) in thepositions of the C α atoms (in ˚A) for core residues in duplicate x-ray crystal structures (solid blackline), in the NMR model ensemble for each structure (solid red line), and in paired x-ray crystal andNMR structures (dot-dashed blue line). We also plot the distribution for ∆ = (cid:112) B/ π from theB-factor for core C α atoms in the duplicate x-ray crystal structures (dashed black line). The insetshows an example of one of the proteins in the paired x-ray crystal and NMR structure dataset, withthe x-ray crystal structure on the left and the bundle of 20 NMR structures on the right (PDB codes3K0M and 1OCA, respectively). The α -helices are colored purple, the β -sheets are yellow, and theloops are gray. (b) The fraction of core amino acids F (∆ χ ) with root-mean-square deviations ofthe side chain dihedral angles less than ∆ χ (in degrees) for the pairwise comparisons in (a). Theinset is a schematic in two dimensions of the high-dimensional volume in configuration space thatthe C α atoms in core residues in x-ray crystal structures and NMR ensembles sample. X-ray crystalstructures sample a smaller region than NMR ensembles, but the distance between these regions ofconfiguration space is larger than the fluctuations of both the x-ray crystal and NMR structures.The relative lengths of the arrows are drawn to scale, with (cid:104) ∆ (cid:105) ≈ .
1, 0 .
5, and 0 . α atoms withinan NMR ‘bundle is greater than the RMSD of core C α atoms of the set of protein crystal structures2 a) (c)(b) Figure 2: (a) Fraction of side chain conformations of core residues with predictions from thehard-sphere plus stereochemical constraint model that deviate from the experimentally observedvalues by less than ∆ χ (in degrees) in the dataset of x-ray crystal (black line) and NMR (red line)structure pairs, and the Dunbrack 1.0 dataset of 221 high resolution x-ray crystal structures [17, 18].(b) Fraction of core hydrophobic side chains, grouped by residue type, that can be predicted towithin 30 ◦ of the corresponding experimental structure using the hard-sphere plus stereochemicalconstraint model for xray (black bars) and NMR structures (red bars). (c) Distribution of theoverlap potential energy U RLJ /(cid:15) , calculated using Eq. (3) for core residues in the x-ray crystal(black line) and NMR structures (red line) in the paired dataset.that have been solved multiple times. Also, the difference between an x-ray crystal structureand each structure in the NMR bundle is greater than the spread within the NMR bundle. Togain deeper insight into these differences, we first analyzed side chain repacking of core residuesin structures determined by both NMR and x-ray crystallography, using the hard-sphere plusstereochemical constraint model for both. We have shown in previous work that the ability toaccurately predict the placement of sidechains of core residues is correlated with the high packingfraction (cid:104) φ (cid:105) ≈ .
55 found in protein cores. We are able to predict the placement of side chainsof core residues to above 90% accuracy to within 30 ◦ in structures determined by either x-raycrystallography or NMR, which is indicative of dense packing. When we explicitly calculate thepacking fraction of core residues in protein structures determined by x-ray crystallography versusNMR, we find that that the cores of NMR structures are more tightly packed than the cores ofx-ray crystal structures.To further explore the physical basis for these observations, we generated jammed packings ofamino-acid-shaped particles computationally, and explored the extent to which we can tune theirpacking fraction using protocols with different degrees of themalization. We find that depending onthe thermalization protocol we use, we can match the packing fraction to that which we observe instructures determined by x-ray crystallography and by NMR. Specifically, the packing fraction ofamino acid-shaped particles we observe in the athermal limit corresponds to the packing fractionwe find in the cores of protein crystal structures, whereas the packing fraction we observe in thecores of structures determined by NMR is higher, but less than the packing fraction achieved froma high degree of thermalization. Thus, the core packing fraction we observe for protein structuresdetermined by x-ray crystallography and NMR are both physically reasonable, and we speculatethat the higher packing fraction we observe for structures determined by NMR reflects the differentconditions under which NMR structures are determined.We first compare pairs of structures, determined by x-ray crystallography and NMR, by quanti-fying the root-mean-square deviation (RMSD) of the C α positions of a given set of residues defined3y their sequence location on two structures i and j :∆( i, j ) = (cid:118)(cid:117)(cid:117)(cid:116) N S N s (cid:88) µ =1 ( (cid:126)c µ,j − (cid:126)c µ,i ) , (1)where (cid:126)c µ,i is the position of the C α atom on residue µ in structure i , and N S is the number ofresidues being compared. For the NMR datasets, i and j represent each model within a bundleand for the x-ray crystal duplicate dataset, i and j represent each of the duplicates. We definecore residues as residues with small ( < − ) relative solvent-accessible surface area (rSASA), asdefined in Eq. (1) in the SI. In Fig. 1 (a), we compare the distributions P (∆( i, j )) of RMSD valuesof core residues in x-ray crystal structure duplicates and RMSD values of core residues in NMRbundles. We show that the fluctuations among x-ray crystal structure duplicates are consistentwith B-factor fluctuations of the C α positions of core residues, which are given by ∆ = (cid:112) B/ π .We also compare x-ray crystal and NMR structures for the same proteins by calculating the RMSDbetween C α atoms of core residues.We also calculate the side chain dihedral angle fluctuations ∆ χ for the same pairs of struc-tures; we define ∆ χ ( µ | i, j ) as the distance between the sidechain conformations of residue µ withinstructures i and j , i.e. ∆ χ ( µ | i, j ) = (cid:113) ( (cid:126)χ µ,j − (cid:126)χ µ,i ) . (2)where (cid:126)χ µ,i is the set of dihedral angles ( χ , . . . , χ m ) for residue µ on structure i . Note that inFig. 1(b), we measure ∆ χ between two experimental structures of the same protein, whereas inFig. 2 (a) and (b) we measure ∆ χ between an experimental structure and a prediction using thehard-sphere plus stereochemical constraint model.In Fig. 1, we show that the conformations of both the backbone and sidechains of core residuesfluctuate less in x-ray crystal structures compared to the conformations within an NMR bundle, butthat the fluctuations within an NMR bundle are smaller than those between the x-ray crystal andNMR structure pairs [3, 8, 15]. The inset to Fig. 1 (b) illustrates the proportion of configurationspace sampled for structures solved by both NMR and x-ray crystallography. Structures determinedby x-ray crystallography sample states in a relatively small volume of configuration space comparedto that sampled by structures in an NMR bundle. Moreover, these two ensembles are separated bya characteristic distance that is larger than the scale of fluctuations in either ensemble.To put these structural differences in context, we also analyze fluctuations in a database ofpairs of x-ray crystal structures of wild-type proteins and the same protein with a single core mu-tation and also high-scoring submissions from a recent Critical Assessment of Protein StructurePrediction (CASP) competition [13]. We find that the scale of the fluctuations of single-site coremutants relative to wildtype structures is similar to that observed in x-ray crystal structure dupli-cates. In contrast, submissions to CASP12 exhibit much larger fluctuations. Because the CASP12submissions are computational predictions, not experimentally determined structures, one wouldexpect large structural fluctuations among the different CASP12 submissions. The scale of thestructural fluctuations among the CASP12 submissions is also larger than those between structuresof the same protein determined by x-ray crystallography or NMR. In the SI, we report additionalmeasures of structural fluctuations, such as fluctuations in the identities of residues that make upthe core.To investigate the possible origin of the differences between structures determined by x-raycrystallography and NMR, we first investigated if these differences are due to the physical forcesgoverning side chain placement in the core. In previous work, we showed that the hard-sphere plusstereochemical constraint model can uniquely specify the sidechain dihedral angles of hydrophobic4igure 3: Distribution P ( φ ) of the packing fraction of core residues in the Dunbrack 1 . µ has adopted a given side chain confirmation (cid:126)χ µ as P ( (cid:126)χ µ ) ∝ exp[ − βU RLJ ( (cid:126)χ µ )], where U RLJ ( (cid:126)χ µ ) = N (cid:88) ν =1 (cid:88) i,j (cid:15) − (cid:32) σ µνij r µνij (cid:33) Θ (cid:16) σ µνij − r µνij (cid:17) (3)is the purely repulsive Lennard-Jones potential energy of residue µ , evaluated as a sum over allnon-bonded atomic interactions. r µνij is the distance between atoms i and j on residues µ and ν , σ µνij = (cid:16) σ µi + σ νj (cid:17) /
2, and σ µi is the diameter of atom i on residue µ .However, when we investigate the packing fraction φ of core residues for x-ray crystal and NMRstructures, we find important differences. In Fig. 3, we plot the probability distribution P ( φ ) of thepacking fraction for core residues in x-ray crystal and NMR structures. The average packing fractionof core residues in the protein structures in the datasets determined by x-ray crystallography is (cid:104) φ (cid:105) = 0 . ± .
01, a value that is consistent with our previous results for the packing fraction of coreresidues in globular and transmembrane protein cores and the cores of protein-protein interfacessolved by x-ray crystallography [7]. For core residues of protein structures in the NMR database,the average packing fraction is higher with (cid:104) φ (cid:105) = 0 . ± .
02. We believe that this is the first timethat such a difference in the packing fraction of core residues in high-quality protein structuresdetermined by both x-ray crystallography and NMR has been reported.5 -2 Figure 4: The ensemble-averaged packing fraction (cid:104) φ J (cid:105) of jammed packings of amino-acid-shapedparticles versus the dimensionless thermalization timescale τ for a system with N = 16 particles.Different colors represent simulations with different dimensionless temperatures k B T /(cid:15) , logarith-mically spaced from 10 − (blue) to 1 (red). The dashed black line at (cid:104) φ J (cid:105) = 0 .
55 is the averagepacking fraction of core residues in x-ray crystal structures, and the dashed red line at (cid:104) φ J (cid:105) = 0 . U RLJ for the structures determined by x-ray crystallography and NMR.However, we found that structures determined by either method have virtually identical overlapenergies as shown in Fig. 2 (c), and we therefore ruled out this potential cause. The difference inthe packing fraction of core residues was at first surprising, because our previous studies showedthat the cores of x-ray crystal structures pack as densely as jammed packings of purely-repulsiveamino-acid-shaped particles without backbone constraints generated using a protocol of successivecompressions followed by energy minimization [6, 16].We therefore revisited the protocol with which we prepared jammed packings of amino-acid-shaped particles [6, 16]. In our previous work, packings were prepared using an “athermal” protocol,where kinetic energy was drained rapidly from the system during the packing preparation. For theathermal protocol, amino acids were initialized in a cubic simulation box at a small initial packingfraction φ and compressed by small increments ∆ φ with each followed by energy minimization.(See SI for additional details.) Because the amino-acid-shaped particles were not allowed to trans-late and rotate significantly between each compression step, the jammed packings at (cid:104) φ (cid:105) ≈ .
55 wereobtained at the first metastable jammed state that the protocol encountered. Thus, the packingfractions that can be achieved by amino-acid-shaped particles is protocol-dependent. We there-fore investigated more thermalized protocols to generate jammed packings of amino-acid-shapedparticles.We chose a family of quasi-annealing packing-generation protocols. We initialize the systemin a dilute configuration, and compress the system in small increments ∆ φ between periods ofmolecular dynamics simulations in the canonical ensemble for a time period t MD at thermal energy k B T . (See SI for details.) We find that temperature only acts to renormalize the simulation timewindow t MD , i.e. a longer simulation run at a lower temperature will yield the same results as ashorter simulation run at a higher temperature. Thus, there is another time scale associated with6he quasi-annealing protocol, t QA = c ( T ) t ∗ , where c ( T ) is a dimensionless quantity that depends ontemperature, t ∗ = (cid:113) m R σ R /(cid:15) , and m R and σ R are the mass and diameter of the smallest residue.We find that plotting the packing fraction versus τ , where τ = t MD /t QA = n (cid:18) k B T(cid:15) (cid:19) α , (4)collapses the data for different temperatures and time periods onto a single curve as shown in Fig. 4.The exponent α = 0 . ± .
01 and n is the number of time steps between compression increments.Two limits of packing fractions emerge over the broad range of quasi-annealing protocols wetested; an athermal limit, which corresponds to packing fractions one finds in the cores of x-raycrystal structures [6, 7], and the completely thermalized limit, which can reach average packingfractions (cid:104) φ (cid:105) ≈ .
62. The packing fractions observed in the cores of protein structures solved byNMR fall between these two extremes with (cid:104) φ (cid:105) = 0 .
59. The states at exceedingly high packingfractions exist only in the limit of extremely long annealing times. The results of simulationsusing different protocols are consistent with the differences observed in the cores of protein struc-tures solved by x-ray crystallography and NMR. The fact that thermalized packing protocols canyield NMR-like packing fractions, and that athermal protocols generate x-ray crystal-like packingfractions, suggests that native-state fluctuations are distinct for these two methods.A previous study that also compared protein structures determined by x-ray crystallographyand NMR suggested that the crystal environment restricts dynamical fluctuations, whereas bundlesof NMR structures in solution contain the full dynamics one would expect from elastic networkmodels for proteins [19]. The work we present here provides significant further evidence to supportthis conclusion, but whether the differences are due to crystalline contacts [9, 15, 19] or the differenttemperatures at which the protein structures are characterized [4, 10] remains to be determined.
Author Contributions
Z.M. and J.D.T. contributed equally to this article. Z.M., J.D.T., Z.A.L., L.R., and C.S.O. designedthe research. Z.M. compiled the dataset of protein structures, J.D.T. carried out simulations, andZ.M., J.D.T., Z.A.L., L.R. and C.S.O. interpreted the results. Z.M., J.D.T., C.S.O. and L.R. wrotethe article.
Acknowledgments
The authors acknowledge support from NIH training Grant No. T32EB019941 (J.D.T.), the Ray-mond and Beverly Sackler Institute for Biological, Physical, and Engineering Sciences (Z.M.), andNSF Grant No. PHY-1522467 (C.S.O.). This work also benefited from the facilities and staff ofthe Yale University Faculty of Arts and Sciences High Performance Computing Center. We thankPat Loria and Peter Moore for helpful discussions.
References [1] R. B. Best, K. Lindorff-Larsen, M. A. DePristo, and M. Vendruscolo. Relation between nativeensembles and experimental structures of proteins.
Proceedings of the National Academy ofSciences , 103(29):10901–10906, 2006. ISSN 0027-8424. doi: 10.1073/pnas.0511156103.72] D. Caballero, A. Virrueta, C. O’Hern, and L. Regan. Steric interactions determine side-chainconformations in protein cores.
Protein Engineering, Design and Selection , 29(9):367–376,2016.[3] J. K. Everett, R. Tejero, S. B. K. Murthy, T. B. Acton, J. M. Aramini, M. C. Baran, J. Benach,J. R. Cort, A. Eletsky, F. Forouhar, R. Guan, A. P. Kuzin, H.-W. Lee, G. Liu, R. Mani, B. Mao,J. L. Mills, A. F. Montelione, K. Pederson, R. Powers, T. Ramelot, P. Rossi, J. Seetharaman,D. Snyder, G. V. T. Swapna, S. M. Vorobiev, Y. Wu, R. Xiao, Y. Yang, C. H. Arrowsmith,J. F. Hunt, M. A. Kennedy, J. H. Prestegard, T. Szyperski, L. Tong, and G. T. Montelione.A community resource of experimental data for nmr / x-ray crystal structure pairs.
ProteinScience , 25(1):30–45, 2016. doi: 10.1002/pro.2774.[4] J. S. Fraser, H. van den Bedem, A. J. Samelson, P. T. Lang, J. M. Holton, N. Echols, andT. Alber. Accessing protein conformational ensembles using room-temperature x-ray crystal-lography.
Proceedings of the National Academy of Sciences , 108(39):16247–16252, 2011. ISSN0027-8424. doi: 10.1073/pnas.1111325108.[5] J. Gaines, A. Virrueta, D. Buch, S. Fleishman, C. O’Hern, and L. Regan. Collective repackingreveals that the structures of protein cores are uniquely specified by steric repulsive interac-tions.
Protein Engineering, Design and Selection , 30(5):387–394, 2017.[6] J. C. Gaines, W. W. Smith, L. Regan, and C. S. O’Hern. Random close packing in proteincores.
Phys. Rev. E , 93:032415, 2016.[7] J. C. Gaines, S. Acebes, A. Virrueta, M. Butler, L. Regan, and C. S. O’Hern. Comparingside chain packing in soluble proteins, protein-protein interfaces and transmembrane proteins.
Proteins: Structure, Function, and Bioinformatics , 86(5):581–591, 2018. doi: 10.1002/prot.25479.[8] S. O. Garbuzynskiy, B. S. Melnik, M. Y. Lobanov, A. V. Finkelstein, and O. V. Galzitskaya.Comparison of x-ray and nmr structures: Is there a systematic difference in residue contactsbetween x-ray- and nmr-resolved protein structures?
Proteins: Structure, Function, andBioinformatics , 60(1):139–147, 2005. doi: 10.1002/prot.20491.[9] B. Halle. Biomolecular cryocrystallography: Structural changes during flash-cooling.
Proceed-ings of the National Academy of Sciences , 101(14):4793–4798, 2004. ISSN 0027-8424. doi:10.1073/pnas.0308315101.[10] X. Hu, L. Hong, M. Dean Smith, T. Neusius, X. Cheng, and J. C. Smith. The dynamicsof single protein molecules is non-equilibrium and self-similar over thirteen decades in time.
Nature Physics , 12:171–174, 11 2015.[11] J. Koehler Leman, A. R. D’Avino, Y. Bhatnagar, and J. J. Gray. Comparison of nmrand crystal structures of membrane proteins and computational refinement to improvemodel quality.
Proteins: Structure, Function, and Bioinformatics , 86(1):57–74, 2018. doi:10.1002/prot.25402.[12] B. Mao, R. Tejero, D. Baker, and G. T. Montelione. Protein nmr structures refined withrosetta vave higher accuracy relative to corresponding x-ray crystal structures.
Journal of theAmerican Chemical Society , 136(5):1893–1906, 02 2014. doi: 10.1021/ja409845w.813] J. Moult, K. Fidelis, A. Kryshtafovych, T. Schwede, and A. Tramontano. Critical assessmentof methods of protein structure prediction (casp)—round xii.
Proteins: Structure, Function,and Bioinformatics , 86(S1):7–15, 2 2018. ISSN 1097-0134. doi: 10.1002/prot.25415.[14] M. Schneider, X. Fu, and A. E. Keating. X-ray vs. nmr structures as templates for computa-tional protein design.
Proteins: Structure, Function, and Bioinformatics , 77(1):97–110, 2009.doi: 10.1002/prot.22421.[15] K. Sikic, S. Tomic, and O. Carugo. Systematic comparison of crystal and nmr protein structuresdeposited in the protein data bank.
The Open Biochemistry Journal , 4(83-95):83–95, 2010.[16] J. D. Treado, Z. Mei, L. Regan, and C. S. O’Hern. Void distributions reveal structural linkbetween jammed packings and protein cores.
Phys. Rev. E , 99:022416, 2019.[17] G. Wang and R. L. Dunbrack, Jr. PISCES: A protein sequence culling server.
Bioinformatics ,19(12):1589–1591, 2003.[18] G. Wang and R. L. Dunbrack, Jr. PISCES: Recent improvements to a PDB sequence cullingserver.
Nucleic Acids Research , 33:W94–8, 2005.[19] L.-W. Yang, E. Eyal, C. Chennubhotla, J. Jee, A. M. Gronenborn, and I. Bahar. Insights intoequilibrium dynamics of proteins from comparison of nmr and x-ray data with computationalpredictions.
Structure , 15(6):741 – 749, 2007. ISSN 0969-2126. doi: https://doi.org/10.1016/j.str.2007.04.014.[20] A. Q. Zhou, C. S. O’Hern, and L. Regan. Predicting the side-chain dihedral angle distributionsof nonpolar, aromatic, and polar amino acids using hard sphere models.