Determining the atomic charge of calcium ion requires the information of its coordination geometry in an EF-hand motif
Pengzhi Zhang, Jaebeom Han, Piotr Cieplak, Margaret. S. Cheung
1 Determining the atomic charge of calcium ion in a calcium-binding protein requires the information of its dynamic coordination geometry in aqueous solution
Pengzhi Zhang , Jaebeom Han , Piotr Cieplak , Margaret. S. Cheung
1. Department of Physics, University of Houston, TX, USA 2. Center for Theoretical Biological Physics, Rice University, TX, USA 3. Sanford Burnham Prebys Medical Discovery Institute, CA, USA Corresponding Author: [email protected] Keywords: Ca coordination, ab initio quantum calculation, molecular dynamics simulations, polarization, EF-hand motif, calmodulin Abstract
It is challenging to parameterize the force field for calcium ions (Ca ) in calcium-binding proteins because of its unique coordination chemistry that involves surrounding atoms required for its stability. In this work, we extracted a wide variation of Ca binding loop conformations from Ca binding protein calmodulin (CaM) that adopt the most populated ternary structure types from the MD simulations, followed by ab initio quantum mechanical (QM) calculations on the 12 amino acids from entire loop that coordinate the Ca in aqueous solution. Ca charges are derived by fitting to the ESP in the context of classical or polarizable force field (PFF). We discover that the atom radius of Ca in conventional force fields is too large for the QM calculation to capture the variation in coordination geometry of Ca in its ionic form and lead to unphysical charges. Specially, we find that the fitted atomic charges of Ca in the context of PFF depend on the coordinating geometry of electronegative atoms from the amino acids in the loop. Although the nearby water molecules do not influence the atomic charge of Ca , they are crucial for compensating the coordination of Ca due to the conformational flexibility in the EF-hand loop. Our method advances the development of force fields for metal ions and the protein binding sites in dynamical environments. I. Introduction
The calcium ion (Ca ) is a key second messenger controlling many biological processes such as enzyme activation, muscle contraction, and neural signal transduction. A broad spectrum of Ca signals are encoded by protein calmodulin (CaM) through specific binding with various targets regulating CaM-dependent Ca signaling pathways in neurons . CaM can bind up to four Ca through its four helix-coil-helix (called EF-hand) structures . In the coil connecting the two helices, termed EF-hand loop, there are usually six residues coordinating the Ca cooperatively and forming a pentagonal bipyramidal geometry (Figure 1(A, B)). Binding of a Ca in the EF-hand opens up the angle between the two helices, as seen in the right angle in Figure 1A. Although the four Ca binding loops present similar helix-coil-helix structures, they show variant capacity of retaining the Ca . The two loops in the N-lobe of the CaM (nCaM) bind Ca faster than those from the C-lobe of the CaM (cCaM), and the Ca dissociates faster from nCaM . The reasons are elusive and lie in the subtle difference in the amino-acid sequences of the four EF-hand loops and the mobility of the water molecules packed in the loops (Table I). In addition, the conformations of the loops vary with CaM that binds various protein targets (Figure 1C). CaM’s affinity for Ca is further fine-tuned by targets
6, 7 . Molecular dynamics (MD) simulations are excellent tools to investigate the subtle differences in the calcium binding affinities in its four EF-hands from a CaM. However, there is a lack of adequate force fields for Ca in Ca binding proteins due to several major limitations. First, developing more accurate MD force fields (MDFFs) for Ca and its binding component usually requires a series of computational demanding quantum mechanical calculations to provide electronic structures, whose computational cost increases quartically with the system size . Second, when mapping the electronic structures to a point charge in MDFFs in exchange for speed in MD simulations, the interaction between a divalent ion and its receptor protein is simplified and devoid of the strong polarization as well as charge-transfer effects
7, 9-12 . Third, Ca binding component in the protein, due to structural flexibility in both backbone and side-chains, usually generates a myriad of ensemble conformations that further burdens the computational cost of collecting more configurations in the attempt to attain statistical significance by minimizing sampling errors . Figure 1. Coordination of Ca in the EF-hand loops. (A) The coordination geometry in EF-hand 3 from crystal structure of Ca /CaM (PDB ID: 1CLL). An EF-hand is made up of Helix E (gray), EF-hand loop (orange), and Helix F (gray). The Ca ion is represented by a yellow bead. Sidechains of the Ca coordinating residues in the EF-hand loop are represented by sticks (coloring: red à oxygen; cyan à carbon; blue à nitrogen). The bridging water molecule is represented by the oxygen atom as blue bead. Coordination positions of the Ca ion are denoted (±X, ±Y, ±Z). (B) Schematic illustration of the positions of the residues in an EF-hand loop. The filled circles are the ones that coordinate Ca (Table 1). (C) Illustration of structures of Ca /CaM, Ca /CaM/ CaMKII peptide, Ca /CaM/Ng peptide. Ca is in yellow, CaM is in gray, CaMKII peptide is in blue, and Ng peptide is in purple. Abbreviations: CaMKII (Ca /CaM-dependent protein Kinase II); Ng (Neurogranin). Table I. Amino acid sequences of the four Ca binding loops (EF-1, EF-2, EF-3, and EF-4) in calmodulin. Their coordination positions for Ca ions are also provided. ion through backbone oxygen. * denotes the residue that indirectly coordinates Ca through a water molecule. Residue index 1 2 3 4 5 6 7 8 9 10 11 12 Coordination position
X Y Z -Y -X * -Z nCaM EF-1
D K D G D G T I T T K E
EF-2
D A D G N G T I D F P E cCaM
EF-3
D K D G N G Y I S A A E
EF-4
D I D G D G Q V N Y E E In a pentagonal bipyramidal coordination geometry of Ca by an EF hand loop (Figure 1), the Ca is not only coordinated by carboxylate oxygen atoms from the sidechains, but also by one carbonyl oxygen atom from the backbone as well as the oxygen atoms from water molecules (one typical example is in the crystal structure of Ca /CaM as seen in Table I). Because the coordination chemistry from surrounding atoms is vital in stabilizing the ion , we determined the charges of Ca using ab initio quantum calculations by exploring the conformational variation of the amino acids on the EF-hand loops (including coordinating water molecules) in several hundreds of distinct conformations. These loop conformations were taken from CaM in Ca retaining or releasing environments from the existing solved structures or all-atomistic models from our prior work . We investigated the charge distribution over all amino acids in the EF hand loops which are dependent on the backbone flexibility and the side-chain positions chelating the central Ca . We explicitly included the nearby water molecules in generation of the electrostatic potential (ESP) surface. The high-quality electronic structures from the vast amount of the loop configurations allow us to examine the deficiency and inconsistency in the force fields for parameterizing Ca in proteins when its coordination chemistry is not fully considered. II. Materials and Methodology
The atomic partial charge is built on the existing non-polarizable force field (NFF)
17, 18 or polarizable force field (PFF) methodology taking polarization interactions into consideration. The steps of determining the atomic charges are shown in Figure 2. The parameters of the NFF or PFF have been derived using a broad collection of Ca binding loops from all-atomistic MD simulations in explicit solvent for an isolated CaM (Ca /CaM) or its bound complexes from available database. This approach warrants a broad coverage of the configuration space of the Ca and dynamics of its binding EF-hand loops involving the steps as below in Figure 2. Figure 2 Flowchart of determination of atomic charges from MD generated geometries using RESP/i_RESP charge fitting method.
II.1 Sample selections for the initial conditions
Samples of Ca /CaM from all-atom models in the following three conditions were examined. (1) Neat Ca /CaM, crystal structure (PDB ID: 1CLL) was used as the initial structure; (2) Ca -retaining environment: Ca /CaM/CaM dependent protein Kinase II (CaMKII) peptide, the crystal structure (PDB ID: 1CDM, respectively) was used as the initial structure; (3) Ca -releasing environment: Ca /CaM/Neurogranin (Ng) peptide, three representative complex structures were used as the initial structures. We reconstructed those structures into all-atomistic models from coarse-grained models with constraints inferred from the changes in the chemical shifts in the nuclear magnetic resonance (NMR) experiments
7, 25 . II.2 Generation of the ensemble structures by molecular dynamics simulations
In order to generate a broad ensemble of Ca -binding EF-hand loop geometries, all-atom MD simulations on the three systems were performed with GROMACS version 2018 in a periodic box of ~10x10x10 nm with explicit solvent. The CaM or CaM in complex with a CaM binding target (CaMBT; CaMBT can be CaMKII or Ng peptide) was placed at least 1 nm away from the edges of the cubic box. The system was solvated by explicit water molecules using the rigid three-site TIP3P model . The bond lengths in proteins involving the H atoms were constrained using the LINCS algorithm . One of the amino acids in the loop that chelates the Ca ion usually interacts through a water molecule (Table I). The system was neutralized by K + and Cl - ions, maintaining a physiological ionic strength of 150 mM. AMBER-99SB-ILDN force field was used for generating the configuration except for the partial charges on the protein and Ca ions, which were observed by semi-empirical methods using Molecular Orbital PACkage (MOPAC) in our previous work . Electrostatic interactions between periodic images were treated using the particle mesh Ewald (PME) approach , with a grid size of 0.16 nm, fourth-order cubic interpolation and a tolerance of 10 -5 . Neighbor lists were updated dynamically. A cutoff of 10 Å was used for van der Waals interactions, for the real space Coulombic interactions, and for updating neighbor lists. For each simulated system the energy minimization was carried out with the steepest descent method to remove unfavorable clashes between atoms. Next, the system was gradually heated to 300 K in canonical ensemble (NVT) in 1 ns, followed by 1 ns of equilibration of the solvents and ions (proteins were constrained in positions) in isothermal-isobaric ensemble (NPT) to fix the density. The constraints on the proteins were afterwards released and the system was further equilibrated for 5 ns. Finally, a total of 100 ns NPT simulation was carried out for the production run. All NPT simulations maintained a constant pressure of 1 bar using the Parrinello-Rahman barostat . The equation of motion was integrated using a time step of 2 fs. Snapshots were saved for analysis every 1 ps. In total, we generated 500,000 snapshots of Ca /CaM from MD simulations. With four EF-hands in a CaM, we generated 2 million loop structures and further applied clustering analyses to extract the most dominant configurations for QM calculations. II.3 Importance Sampling of Ca /binding EF-hand geometries: application of neural-net clustering to the molecular dynamics snapshots We applied a nonhierarchical clustering algorithm to 2M snapshots of Ca binding loop geometries. We used normalized radial and angular distribution functions as the molecular feature
33, 34 in the neural-net clustering analysis. We focus on the chemical environment surrounding the Ca by using the distribution functions to remove the translational and rotational degrees of freedom. Equations and description of the distribution functions are in the Supplemental Information. A common cut-off of 0.6 was used for EF-hands 1-4 to generate 155, 242, 215, and 164 clusters respectively for further ab initio calculations. In addition to the selected EF-hand loops the corresponding Ca ions, and the water molecules in the first solvation shell were extracted for QM ab initio calculations. II.4 ab initio quantum mechanical calculations
To determine the magnitude of charge transfer between Ca ion and the amino acids in the calcium binding loop, we performed a large-scale ab initio quantum mechanical calculation in terms of both the number of atoms and the numbers of the snapshots. The ab initio calculations have been conducted for the loop fragments including the Ca ion and water molecules in the first solvation shell of the Ca . The loop structures were capped with acetyl and methyl groups at C- and N-termini, respectively, before performing the ab initio calculations. All ab initio calculations were performed with Gaussian16 program. The ab initio -derived molecular electrostatic potential (ESP) from the electronic structures was further used for fitting excess atomic charges in the context of either non-polarizable force field (NFF) and polarizable force field (PFF). II.5 The development of atomic charges in non-polarization force field and polarizable force field
We employed the restrained electrostatic potential (RESP) and i_RESP charge fitting parameterization protocol to fit the partial atomic charges for the NFF and PFF, respectively. For PFF, the atomic charges conforming to Thole-linear polarization model were fitted to the molecular electrostatic potentials with i_RESP program . The 1–2 and 1–3 short-range interactions were excluded from the fitting, while the 1–4 long-range interactions were included. In the fitting involving water molecules, we used atomic charges and polarizabilities from POL3 water model . For Ca ion we used experimental value of ionic polarizability 3.26 a.u. (0.483Å ) . III. Results III.1 The B3LYP/SVP basis set balances accuracy and computational cost in ab initio quantum mechanical calculation
As the first attempt to run QM calculations at the MP2/aug-cc-pVTZ theory level, which was used for deriving original AMBER Thole-linear polarization force field , we set up a small systems involving only three amino acid sidechains (2 Asp and 1 Glu in EF-3) and Ca ion. However, this basis set is not available for Ca ion. We next mixed the aug-cc-pVTZ basis set for the sidechain atoms with cc-pVTZ basis set that is available for Ca ion, but it often led to unstable self-consistent field (SCF) process for many configurations and required prohibitively long computation time. To stabilize the configuration by satisfying the coordination chemistry of Ca ion, it is necessary to include the entire loop residues and water molecules from the first solvation shell. However, once we increased the number of residues up to the entire calcium binding loop in the QM calculations, the computational cost immediately became prohibitive. To make our large-scale QM calculations feasible, we applied B3LYP type exchange and correlation functional and SVP basis set . We validated our choice of B3LYP/SVP instead of MP2/aug-cc-pVTZ theory level for charge derivation by performing a series of QM calculations at both the B3LYP/SVP and MP2/aug-cc-pVTZ theory levels for tetrapeptides: ACE-Ala-X-Ala-NME, where X is one of the eight amino acids observed in CaM calcium binding loops as summarized in Table I. In calculations we considered five standard conformations for each tetrapeptide, namely: parallel β -sheet, antiparallel β -sheet, right-handed α -helix, left-handed α -helix and PPII (left-handed polyproline II helical structure). The final atomic charges have been fitted with RESP and i_RESP program (Figure 2) for each tetrapeptide treating either each conformation separately or jointly (joint fit across five conformations). As summarized in Table SI, the mean absolute difference (MAD) between the i_RESP atomic charges derived at the B3LYP/SVP and MP2/aug-cc-pVTZ theory levels are small and at average smaller than 0.11e. The root mean square difference (RMSD) between the i_RESP charges at these two QM theory levels is at the similar magnitude and does not exceed 0.13e. Interestingly, both MAD and RMSD parameters are smaller (by 15-40%) for charges obtained from joint fit of 5 conformations, as compared to separate fit (Table SI). We find that no matter which fitting method (RESP or i_RESP) is used, the atomic charges obtained from the ESPs with the two theory levels are in strong agreement and are highly correlated (Figure 3). Thus, the B3LYP/SVP theory level for derivation of charges is of reasonable accuracy when compared to the MP2/aug-cc-pVTZ for the systems we study. Figure 3
Correlation between the fitted charges using electrostatic potentials (ESPs) from B3LYP/SVP and MP2/aug-cc-pVTZ theory levels. (A) Atomic charges are fitted using RESP. (B) Atomic charges are fitted using i_RESP. The ESPs were calculated using QM for the tetrapeptides ACE-Ala-X-Ala-NME, where X is one of the eight different amino acids observed in CaM Ca binding loops. III.2 Setting up appropriate radius of the Ca ion is crucial to fit the accurate atomic charge The results for the tetrapeptides showed that not only B3LYP/SVP is comparable to MP2/aug-cc-pVTZ but also it enables us to expand the number of atoms in the QM calculations and will enable us to determine the many-body effect at a minor cost of precision. Furthermore, it permits the computation of the electronic structures from a large ensemble of loop conformations efficiently, which improves the accuracy of the overall distribution of the atomic charges. Altogether, the calculations are done for 776 different loop conformations and involved 173~194 atoms and 1693~1861 basis set functions. In the calculation of the molecular electrostatic potential (ESPs) from the electronic structures of the Ca binding loops, we need to manually set up the van der Waals radius for Ca ion ( s vdW ) for building the surface grids for deriving molecular ESP. Initially, we set s vdW = 1.7 Å, same as the van der Waals radius used in the AMBER force fields adapted from Aqvist . By fitting to the obtained ESPs using RESP or i_RESP method, we acquired the atomic charges of the Ca ion and for the atoms in the protein. We find that unphysical values for the Ca i_RESP charges are generated, e.g. Q > 4e or Q < 0 (Figure 4A). More unphysical values are found for the protein atomic charges. Specially, for the i_RESP charges, there is a long tail of unphysical charges > 3 e in the Gaussian-like distribution of the Ca atomic charges (Figure 4B). Figure 4 Comparison between the i_RESP charges of the Ca ion from electrostatic potential using vdW radii of Ca = 1.0 Å and 1.7 Å. (A) Direct comparison between the two sets of i _ RESP charges. Red arrows point to excessively unphysical charges generated using σ vdW = 1.7 Å. (B) distribution of the i _ RESP charges. The distribution was obtained using 776 representative EF-hand loop structures . We noted that such irregularities are due to the parameter of radius for Ca . The default radius of 1.7 Å for Ca in AMBER force field is the van der Waals radius for the atomic Lennard-Jones potential of the calcium while the ionic radius of Ca should be smaller. The ionic radius of Ca depends on the coordination number (CN), for CN = 6, 7, 8, σ vdW was determined to be 1.00, 1.06, 1.12 Å, respectively . Therefore, we explored the dependency of the Ca atomic charges on the ionic radii over a range of values. We compared the Ca atomic charges derived from ESPs using σ vdW = 1.00, 1.06, 1.12 Å, and 1.70 Å using the specific structure for which the Ca i_RESP charge was +4.4e. We show in Figure S3 that, unlike with σ vdW = 1.70 Å, the i_RESP charges of Ca with σ vdW = 1.00 – 1.12 Å are ~1.95 e. On the other hand, the RESP charges of Ca with σ vdW = 1.00 – 1.12 Å are ~1.7 e, which are a bit larger than that derived with σ vdW = 1.70 Å (1.6 e). Moreover, the calcium ion charges are insensitive to σ vdW values from 1.00 to 1.12 Å. Therefore, we set the ionic radius σ vdW = 1.0 Å in the calculations for all conformers of the EF-hand loops, where the CN may vary. For the i_RESP atomic charges of Ca for all conformers, the Gaussian-like distribution has become narrower with σ vdW = 1.0 Å than with σ vdW = 1.7 Å. Most importantly, the unphysical charges greater than +3e or less than 0 have never been observed (Figure 4B). For the RESP charges, not only the width of the distribution become narrower for σ vdW = 1.0 Å, but also the mean shifts from 1.5 e to ~1.8 e (Figure S2). We showed that the use of ionic radius ( σ vdW = 1.0 Å) is more appropriate than covalent radius for addressing Ca molecular ESPs in calcium-binding proteins with explicit water molecules. By using σ vdW = 1.0 Å, we found that comparing with the Mulliken charges or the RESP charges, the i_RESP charges present the largest mean and standard deviation values. This is because the i_RESP charges are more susceptible to the variation of the neighboring atoms by including polarization effect. Details can be found in Figure S1 and Text S1 in SI. III.3 Nearby water molecules have no substantial effect on tuning the atomic charge of Ca We find that the number of water molecules that chelate the Ca , N water , varies in the EF hand loops. In the crystal structure of the EF-hand loops in CaM (PDB ID: 1CLL), there can be one crystal water molecule that chelates the Ca . In the MD simulations, the first solvation shell extends up to 3.3 Å away from the Ca ion (see the radial distribution function g Ca-O (r) for the Ca and the oxygen atoms in the water molecules in Text S2 and Figure S4 in SI). The probability distribution of N water (Figure 5) shows that most EF-hands retain two water molecules. Specifically, for EF-1, in more than half of the snapshots, there are two water molecules and N water ranges from four to six for others; for EF-2, N water varies from one to seven with similar probabilities; for EF-3 & EF-4, the probability of observing N water decreases almost monotonically from two to six. N water is further tuned by target-binding. In the CaM/Ng complex, N water varies from two to seven (Figure S5), as the Ca ion is not favorably retained because the shape of the loops which is more flexible and less spherical (Figure 8). This effect is mainly due to the disruptive interaction between Ng peptide and the Ca binding loops in CaM. Figure 5. Fraction of the number of water molecules in individual Ca binding EF-hands. The distributions were calculated by considering all the trajectories from all-atomistic molecular dynamics simulations of Ca /CaM, Ca /CaM/CaMKII, and Ca /CaM/Ng in explicit solvent. The water molecules are counted within the first solvation shell of Ca ion. Next, we investigated the effect of water molecules on the atomic charge of Ca ion and the protein. To decide how many water molecules are needed to derive reliable atomic charges of Ca ion, we systematically include number of water molecules in the calculation ranging from 0 to 4. As we learned that i_RESP charges are more susceptible to the environment (Text 1 in Supporting Information), we focus on i_RESP charges here. In the fitting, we keep the original parameters of the POL3 water model, including atomic partial charges, atomic polarizabilities of O and H, and screening factor. We selected 163 representative EF-hand loop structures from the simulation of Ca /CaM and Ca /CaM/CaMKII. In 152 structures, number of water molecules in the first solvation shell varies from 1 to 4; in the remaining 11 structures, water is absent. The average generated i_RESP charges of Ca as a function of the number of water molecules around the ion are shown in Figure 6. In general, the Ca atomic charge gradually decreases slightly with increasing number of water molecules; however, the change in the Ca charge is almost negligible (<1%). This indicates that although the number of water molecules surrounding the Ca in the calcium binding loop varies, water is uninfluential in determining the charge of the Ca . Therefore, we suggest including one water molecule for keeping the correct coordination chemistry of the pentagonal bipyramidal geometry for the quantum chemical calculations to determine the Ca charge. Figure 6. Boxplot of the atomic charges of Ca with different number of water molecules included in calculation. The mean value of the Ca charges is in blue solid line. The boxplot is in black and the outliers are in red. III.4 Charges are distributed unevenly in each EF-hand loop
No. of Water Molecules C a + C ha r ge ( e ) Average Since water does not have a substantial role in determining the atomic charge of the Ca , we further investigated the charge distribution on the amino acids from the four EF-hands loops. In Figure 7, we plotted the averaged i_RESP charges on each residue in the EF-hand loop and the atomic charges on the Ca . The charge of the Glu residue at position 12 is less than -1e by ~20% commonly in all four EF-hand loops, this is because in solution the sidechain of the residue at the 12 th position is usually in the bidentate coordination mode and it is expected that more charge transfer happens between it and the Ca ion. For other coordinating residues, i.e. residue at position 1,3,5,7 and 9, on average, there could be an increase in the magnitude of the negative charges or barely any charge redistribution (close to -1e or 0). Surprisingly, for the residues that are not actively involved in the Ca coordination, the average amino-acid charges also deviate from the default/nominal values, especially for the Lys at the 2nd position in EF-1, where it seems the charge transfer occurs between the neighboring Ca coordinating residues, such as the three Asp amino acids at 1st, 3rd, and 5th position. Furthermore, large standard deviations are observed for all of the amino acid charges on the protein (Figure S6), which suggests that these charges are systematically dependent on the loop configurations; hence methodical charge assignment on the protein is required in MD force field development for Ca and amino acids in protein ion binding site. Figure 7 Average net charges per residue in the EF-hand loops.
The i_RESP fitted charges are summed for each residue in each loop. Mean and the standard deviation of the residual charges are shown for the four EF-hand loops (A-D). Residues chelating the Ca in the crystal form of EF-hand loop in Ca /CaM are colored in blue, other residues are colored in gray, and the Ca is colored in orange. III.5 The conformation of the Ca binding loops dictates the number of water molecules coordinating Ca We show that the conformation of the Ca binding loops that vary with CaM-binding targets (CaMBT) dictates the number of water molecules coordinating Ca . In order to explore the conformational variation of the EF-hand loops in our MD simulations, we plotted (Figure 8) the potential of mean force (PMF) as a function of asphericity of the EF-hand ( Δ ; see definition in Supplemental Information) and the distance between the Ca ion to the center of mass (COM) of the corresponding loop (d COM ). There are substantial differences in the PMFs between the Ca - retaining (Ca /CaM or Ca /CaM/CaMKII) and Ca -releasing environments (Ca /CaM/Ng). In the former, the position of the Ca is restricted and close to the center of loop (d COM = 2~3 Å) as seen in Figure 8(A-H) and the EF-hand loops are generally spherical with Δ = 0.2~0.3 to facilitate a holo-spherical coordination of Ca by the EF-hand loop. In the latter, there are several basins along a wide range of both Δ and d COM as shown in Figure 8(I-L). d
COM varies from 2 Å to as large as 10 Å, which means that the Ca in the CaM/Ng complex can be bound, loosely bound, and in unbound states. Δ varies from 0 to 0.6, which represents spherical and largely linear shape of the EF-hand loop. The spherical shape of the EF-hand loops mostly corresponds to close proximity of the Ca to them, this leads to the bound state of the Ca in a holo-spherical geometry. More linear shape of the EF-hand loops leads to hemi-spherical or planar geometries of coordination of the Ca by the EF-hand loops, where nearby water molecules takes the place of the protein Oxygen atoms and N water increases. To gain a deeper understanding of the role of the amino acids in the EF-hand loops in determining the Ca stability in the EF-hand loops, we investigated the intermolecular interaction between the loop and the CaMBT. The following describes tiers of residues that were found to be involved in these interactions. (i) The residues at position 1 and 12 coordinate Ca most consistently while other residues are more often substituted by water molecules when Ca is loosely bound (Figure 8). In the meantime, those two residues are rarely involved in the interaction with the CaMBT (Figure 9). (ii) In the crystal structure of CaM, to coordinate Ca , residue at position 7 extends the backbone oxygen towards the Ca and consecutive residue at position 9 far away from the Ca requires a water molecule to chelate the Ca . Those “frustrated” residues are dedicated to forming EF-hand β -scaffold with their counterparts in the neighboring EF-hand loop to stabilize the shape of the two EF-hands in the same . (iii) Residue 8 is usually found to be hydrophobic and not actively involved in Ca coordination. Its backbone is part of the EF-hand β -scaffold and direction of the sidechain is tuned by the CaMBT. When sidechain points towards CaMKII (contacts are found between residue 8 and CaMKII in Figure 9) the Ca binding is enhanced; when sidechain points away from Ng (no contact between residue 8 and Ng is found in Figure 9) then Ca binding is weakened. Figure 8. The potential of mean force (PMF) as a function of asphericity ( Δ ) of the loop and the distance between Ca and center of mass of the loop. The PMFs are plotted for individual four EF-hands from three types of sample systems (A-D) Ca /CaM, (E-H) Ca /CaM/CaMKII, and (I-L) Ca /CaM/Ng. The color bar is scaled by kcal/mol and the lowest PMF value is set to 0 kcal/mol. Figure 9. Contact formation between loops and the target.
Snapshots from all-atomistic molecular dynamics simulations of Ca /CaM/CaMKII and Ca /CaM/Ng. A contact is formed if the distance between any atom from a residue in the loop and any atom from the residues in the CaMBT is within 4Å. The number of contacts is then normalized by the number of snapshots and corresponding maximum number of contacts in each system. IV. Discussion IV.1 Water molecules insure the pentagonal bipyramidal coordination of divalent Ca in EF-hand loops We have established a feasible functional for the quantum mechanical calculation on Ca in EF-hand loops of CaM protein. With the electronic structures involving all of the amino acids in a calcium binding loop and the Ca , we showed that the geometry of calcium binding loop is essential in stabilizing the pentagonal bipyramidal coordination of divalent Ca . Ca is a metal ion that exhibits variability in coordination number and geometry because of its relatively large size allowing for close packing of the ligands
45, 46 . Ca ion is most commonly found to be coordinated by seven or eight ligands in the crystal form . For the EF-hand loops in a Ca binding protein, the coordination of Ca usually presents a pentagonal bipyramidal geometry (Figure 1). In the crystal structure of CaM, to form the pentagonal geometry, the crucial feature is the conserved 12 th residue which contributes two carboxylate oxygen (bidentate) due to its long sidechain. Mutating the 12 th residue to a shorter sidechain such as Asp leads to an octagonal geometry and a decrease in the Ca affinity by 100-fold . The bidentate feature has been thought to be the main reason that EF-hands can bind Ca with higher affinity over other divalent ions such as Mg or Zn . For the two apexes in the bipyramidal geometry, a water molecule is necessary to bridge the Ca and a sidechain falling short either because of limited length of the sidechain or to fulfill its active role in other functions. For the latter reason, one example is the 9 th residue belonging to the β -sheet structure between two EF-hand loops in a lobe of CaM, which is termed as EF-hand β -scaffold . Interactions between the residue and the other loop including or not interaction with a CaMBT can affect the stability and conformation of the EF-hand loop, which further influence the Ca coordination geometry . Due to the mobility of the water molecules, the pentagonal bipyramidal geometry can be easily shifted to 6-coordinate (octagonal geometry) or 8-coordinate geometry, where two water molecules are observed as in the crystal structures . In our MD simulations, we show that packing two water molecules around the Ca ion is of highest probability (Figure 5). This agrees with classical MD simulations of Ca in an EF-hand without polarization effects . For the Ca -retaining environment, the water molecules surrounding the Ca ion allow for flexibility in the protein dynamics while retaining the Ca in the pentagonal bipyramidal geometry. For the Ca -releasing environment Ca /CaM/Ng, there are three to six water molecules in the first solvation shell. Ca ions are likely to be partially or completely exposed to the solvent and partially coordinated by the EF-hand loops (Figure 8(I-L)). This also suggests that the coordination geometry of the Ca involving at least one water molecule dynamically respond to the change in the local environment, which could be a result of binding or dissociation of a CaM binding target. IV.2 Determining atomic charge of Ca requires the information of the loop geometry For divalent ions such as Ca , the effort of force field development has been limited to bare ions in aqueous solution . To address the force field of Ca in protein environments, Jungwirth’s group accounted for the effective polarization by scaling the ionic charges by a factor of 0.75 in a classical force field. In this case, the computed value of the affinity of Ca in the four individual EF-hands of CaM is about twice the experimentally measured values . This mean-field approach does not determine charge distribution on the protein directly, which impacts the shape of the EF-hand loop and furthermore the affinity and dynamics of Ca binding. An ensemble average approach was adopted by Cheung et al.
7, 53 that averages an ensemble of semi-empirically determined atomic charges obtained from an ensemble of protein configurations involving the loops, where the Ca atomic charge ranges from +1.2 to +1.8 e. However, polarization effect was not explicitly included. According to the work done by Ren et al. , many-body effect plays an essential role in correctly determining the Ca affinity. In this work, we allow the pentagonal bipyramidal coordination geometry on an ion by including all the residues in the loop and the coordinating water with Ca in the ESP calculation. As shown in Figure 7, there is nonuniform charge redistribution on all residues in the EF-hand loop, meaning that their interaction with Ca cannot be simplified by considering only a few amino acids from the entire loop. We justified the necessity of including all residues in the EF-hand loop to calculate the atomic charges on Ca . IV.3 Ionic radius of Ca in the electrostatic potential calculation In this work, we compared the atomic charges fitted to ESPs generated using several ionic radii of Ca . We notice that there are discrepancies between the van der Waals radius of Ca ion in the Lennard-Jones potential in popular non-polarizable force fields. In AMBER, CHARMM/27, and OPLS-AA force fields, 1.7 Å is used; in GROMOS, ~2.0 Å is used; in CHARMM/36, this value is 1.37 Å. Controversial results were found for all of those values in calculation of hydration free energy or binding affinity
7, 49, 54-56 . At the quantum chemistry level, we found that 1.7 Å failed to generate physical atomic charge for Ca and the result did not improve by performing energy minimization before the QM calculation. Using an improperly large radius of Ca causes artifacts from being too close to other atoms. A large radius of 2.28 Å was adopted to account for the solvation effects when the COSMO implicit water model was used . We suggest that small, ionic radii of Ca and other divalent ions being used in the RESP/i_RESP charge fitting when the solvent molecules are explicitly included. Conclusion
CaM senses a broad spectrum of Ca signals for eukaryotic cells and acts as a hub of many downstream pathways
2, 58 . From incoming calcium signals to a particular pathway, an additional peptide (i.e. CaMBT) is required for tuning CaM’s response . Specific regions in the CaMBT can tune the Ca binding affinity for the EF-hand loop through stabilizing or disrupting interaction with the EF-hand loop . Moreover, we discovered that although the nearby water molecules do not influence the charge of Ca , they are crucial for compensating the coordination of Ca due to the conformational flexibility in the EF-hand loop. The role of the residues in the EF-hand loop has been well studied systematically
60, 61 , however, without context of the CaMBT. In this work, we discovered that including all those residues in the ESP calculation leads to a feasible method of accurate determination of atomic charges on the Ca and its binding protein, which will further advance development of both non-polarizable and polarizable force field for divalent ions in dynamic environments. Acknowledgement
We thank Dr. Neal M. Waxham for stimulating discussions. We wish to thank Mr. Nate Jennings and Jules Nde for reading and comments on the manuscript. The work was supported by a grant from the National Institutes of Health (2R01GM097553). The authors thank the computing resources from The Extreme Science and Engineering Discovery Environment (TG-MCB190109), and computing clusters uHPC and Sabine at University of Houston.
Data Availability Statement:
The data that support the findings of this study are available from the corresponding author upon reasonable request. References M. N. Waxham, and M. S. Cheung, in
Encyclopedia of Computational Neuroscience , edited by D. Jaeger, and R. Jung (Springer, 2013). A. P. Yamniuk, and H. J. Vogel, Mol. Biotechnol. (2004) 33. Z. Xia, and D. R. Storm, Nat. Rev. Neurosci. (2005) 267. H. Tidow, and P. Nissen, FEBS J. (2013) 5551. G. C. Faas, S. Raghavachari, J. E. Lisman, and I. Mody, Nat. Neurosci. (2011) 301. T. R. Gaertner, J. A. Putkey, and M. N. Waxham, J. Biol. Chem. (2004) 39374. P. Zhang, S. Tripathi, H. Trinh, and M. S. Cheung, Biophys. J. (2017) 1105. J. B. Foresman, and Æ. Frisch,
Exploring chemistry with electronic structure methods (Gaussian Inc. , Wallingford, CT, 1995), 2nd edn., 122. I. Leontyev, and A. Stuchebrukhov, Phys. Chem. Chem. Phys. (2011) 2613. D. Jiao, C. King, A. Grossfield, T. A. Darden, and P. Ren, J. Phys. Chem. B (2006) 18553. Z. Jing, C. Liu, R. Qi, and P. Ren, Proc. Natl. Acad. Sci. U. S. A (2018) E7495. J. Melcr, and J. P. Piquemal, Front Mol Biosci (2019) 143. C. Kobayashi, and S. Takada, Biophys. J. (2006) 3043. T. Ikeda, M. Boero, and K. Terakura, J. Chem. Phys. (2007) 074503. M. Ikura, Trends Biochem. Sci. (1996) 14. M. R. Nelson, and W. J. Chazin, Protein Sci. (1998) 270. C. I. Bayly, P. Cieplak, W. Cornell, and P. A. Kollman, J. Phys. Chem. (1993) 10269. P. Cieplak, W. D. Cornell, C. Bayly, and P. A. Kollman, J. Comput. Chem. (1995) 1357. O. Borodin, J. Phys. Chem. B (2009) 11463. K. Schulten, and M. Tesch, Chem. Phys. (1991) 421. J. B. Hooper, O. N. Starovoytov, O. Borodin, D. Bedrov, and G. D. Smith, J. Chem. Phys. (2012) 194506. O. N. Starovoytov, H. Torabifard, and G. A. Cisneros, J. Phys. Chem. B (2014) 7156. R. E. Duke, O. N. Starovoytov, J. P. Piquemal, and G. A. Cisneros, J. Chem. Theory Comput. (2014) 1361. H. Torabifard, O. N. Starovoytov, P. Ren, and G. A. Cisneros, Theor. Chem. Acc. (2015) 101. L. Hoffman, A. Chandrasekar, X. Wang, J. A. Putkey, and M. N. Waxham, J. Biol. Chem. (2014) 14644. M. J. Abraham, T. Murtola, R. Schulz, S. Páll, J. C. Smith, B. Hess, and E. Lindahl, SoftwareX (2015) 19. W. L. Jorgensen, J. Chandrasekhar, J. D. Madura, R. W. Impey, and M. L. Klein, J. Chem. Phys. (1983) 926. B. Hess, H. Bekker, H. J. C. Berendsen, and J. G. E. M. Fraaije, J. Comput. Chem. (1997) 1463. K. Lindorff-Larsen, S. Piana, K. Palmo, P. Maragakis, J. L. Klepeis, R. O. Dror, and D. E. Shaw, Proteins (2010) 1950. T. Darden, D. York, and L. Pedersen, J. Chem. Phys. (1993) 10089. M. Parrinello, and A. Rahman, J. Appl. Phys. (1981) 7182. Z. Guo, and D. Thirumalai, Fold Des. (1997) 377. B. K. Rai, and G. A. Bakken, J. Comput. Chem. (2013) 1661. J. Behler, J. Chem. Phys. (2016) 170901. M. J. Frisch, G. W. Trucks, H. B. Schlegel, G. E. Scuseria, M. A. Robb, J. R. Cheeseman, G. Scalmani, V. Barone, G. A. Petersson, H. Nakatsuji, X. Li, M. Caricato, A. V. Marenich, J. Bloino, B. G. Janesko, R. Gomperts, B. Mennucci, H. P. Hratchian, J. V. Ortiz, A. F. Izmaylov, J. L. Sonnenberg, Williams, F. Ding, F. Lipparini, F. Egidi, J. Goings, B. Peng, A. Petrone, T. Henderson, D. Ranasinghe, V. G. Zakrzewski, J. Gao, N. Rega, G. Zheng, W. Liang, M. Hada, M. Ehara, K. Toyota, R. Fukuda, J. Hasegawa, M. Ishida, T. Nakajima, Y. Honda, O. Kitao, H. Nakai, T. Vreven, K. Throssell, J. A. Montgomery Jr., J. E. Peralta, F. Ogliaro, M. J. Bearpark, J. J. Heyd, E. N. Brothers, K. N. Kudin, V. N. Staroverov, T. A. Keith, R. Kobayashi, J. Normand, K. Raghavachari, A. P. Rendell, J. C. Burant, S. S. Iyengar, J. Tomasi, M. Cossi, J. M. Millam, M. Klene, C. Adamo, R. Cammi, J. W. Ochterski, R. L. Martin, K. Morokuma, O. Farkas, J. B. Foresman, and D. J. Fox,
Gaussian 16 Rev. C.01
Gaussian Inc. , Wallingford, CT, 2016. P. Cieplak, J. Caldwell, and P. Kollman, J. Comput. Chem. (2001) 1048. J. W. Caldwell, and P. A. Kollman, J. Phys. Chem. (1995) 6208. U. Öpik, Proc. Phys. Soc (1967) 566. J. Wang, P. Cieplak, J. Li, J. Wang, Q. Cai, M. Hsieh, H. Lei, R. Luo, and Y. Duan, J. Phys. Chem. B (2011) 3100. A. D. Becke, J. Chem. Phys. (1993) 5648. A. Schäfer, H. Horn, and R. Ahlrichs, J. Chem. Phys. (1992) 2571. J. Ȧ qvist, J. Phys. Chem. (1990) 8021. R. D. Shannon, Acta Crystallogr. Sect. A (1976) 751. M. Nara, H. Morii, and M. Tanokura, Biochim. Biophys. Acta (2013) 2319. J. J. Falke, S. K. Drake, A. L. Hazard, and O. B. Peersen, Q. Rev. Biophys. (1994) 219. T. Dudev, and C. Lim, Chem. Rev. (2003) 773. A. K. Katz, J. P. Glusker, S. A. Beebe, and C. W. Bock, J. Am. Chem. Soc. (1996) 5752. Z. Grabarek, J. Mol. Biol. (2006) 509. S. Marchand, and B. Roux, Proteins (1998) 265. D. Jiao, C. King, A. Grossfield, T. A. Darden, and P. Ren, J. Phys. Chem. B (2006) 18553. I. Leontyev, and A. A. Stuchebrukhov, Phys. Chem. Chem. Phys. (2011) 2613. M. Kohagen, M. Lepsik, and P. Jungwirth, J. Phys. Chem. Lett. (2014) 3964. Q. Wang, P. Zhang, L. Hoffman, S. Tripathi, D. Homouz, Y. Liu, M. N. Waxham, and M. S. Cheung, Proc. Natl. Acad. Sci. U. S. A (2013) 20545. C. Chen, Y. Huang, X. Jiang, and Y. Xiao, Phys Rev E Stat Nonlin Soft Matter Phys (2013) 062705. A. Basit, R. K. Mishra, and P. Bandyopadhyay, J. Biomol. Struct. Dyn. (2020) 1. M. Lawrenz, J. Wereszczynski, R. Amaro, R. Walker, A. Roitberg, and J. A. McCammon, Proteins (2010) 2523. M. Lepsik, and M. J. Field, J. Phys. Chem. B (2007) 10012. K. P. Hoeflich, and M. Ikura, Cell (2002) 739. Y. Kubota, J. A. Putkey, and M. N. Waxham, Biophys. J. (2007) 3848. M. R. Nelson, E. Thulin, P. A. Fagan, S. Forsen, and W. J. Chazin, Protein Sci. (2002) 198. C. G. Bunick, M. R. Nelson, S. Mangahas, M. J. Hunter, J. H. Sheehan, L. S. Mizoue, G. J. Bunick, and W. J. Chazin, J. Am. Chem. Soc. (2004) 5990. upplemental information
Pengzhi Zhang , Jaebeom Han , Piotr Cieplak , Margaret. S. Cheung
1. Department of Physics, University of Houston, TX, USA 2. Center for Theoretical Biological Physics, Rice University, TX, USA 3. Sanford Burnham Prebys Medical Discovery Institute, CA, USA Corresponding Author: [email protected] ext 1. i_RESP charges are more sensitive to the surrounding environment than the charges derived by other methods
We determined the atomic partial charge of the Ca in 776 representative EF-hand loops along with the closest water molecule(s). Mulliken charges, restrained ESP fitted charges (RESP charge), and i_RESP charges of the Ca were compared. All of the three sets of Ca atomic charges roughly follow Gaussian distributions, as shown in Figure S1. The distribution of the Mulliken charges of the Ca centers at a small value of +1.27 e and shows a noticeable tail at +1.85 e. The mean and standard deviation of the RESP and i_RESP Ca charge distributions are (+1.74 e, 0.07 e) and (+1.89 e, 0.11 e), respectively. Comparing with the Mulliken charges or the RESP charges, the i_RESP charges present the largest mean and standard deviation values, because of the inclusion of the polarization effect in i_RESP charge fitting. First, in the i_RESP fitting, atomic charges were fitted to minimize the difference between the QM derived electrostatic potential and that recalculated from the point charges and the induced dipole interactions. Mulliken and RESP charges, however, do not explicitly include the induced dipole interactions, ESP was matched to the quantum mechanical calculation by effectively distribute part of the atomic charges on Ca to the protein. For the same Ca -protein geometry, the i_RESP charge of the Ca is almost always larger than the RESP charge. Therefore, the distribution of the i_RESP charges of Ca presents the largest mean value. Second, because of the inclusion of the induced dipole interactions, the i_RESP charges are more susceptible to the variation of the neighboring atoms. In our calculation, we cover a wide variety of the Ca binding environments by sampling from distinct clusters of Ca binding loops, this is reflected by the i_RESP charges of the Ca with the largest standard deviation. Figure S1. Distribution of Ca atomic partial charges. Mulliken charges, RESP fitted, and i_RESP fitted charges are compared. able SI . Comparison between the i_RESP fitted charges using electrostatic potentials from B3LYP/SVP and MP2/aug-cc-pVTZ theory levels.
The mean of the ratio between the two sets of fitted charges, the mean absolute difference (MAD), and the root mean square deviation (RMSD) are used for the quantitative comparison. All calculations have been performed for amino acids tetramers in five standard conformations. The tetramer is ACE-Ala-X-Ala-NME, where X stands for one of the eight amino acid types observed in CaM calcium binding loops. For each tetramer, the charges were fitted separately or jointly equivalencing charges across the five conformations. X Mean of ratio MAD (e) RMSD (e) Separate fit Joint fit Separate fit Joint fit Separate fit Joint fit Ala 0.838 0.825 0.048 0.040 0.058 0.046 Ile 1.321 0.710 0.076 0.052 0.114 0.064 Tyr 0.827 0.855 0.060 0.042 0.074 0.054 Ser 0.925 0.949 0.051 0.037 0.068 0.050 Asn 0.870 0.828 0.082 0.068 0.100 0.079 Lys 1.032 0.836 0.049 0.042 0.068 0.056 Asp 0.828 0.808 0.129 0.113 0.151 0.131 Glu 0.790 1.420 0.091 0.072 0.114 0.094
Figure S2 Comparison between the RESP charges of the Ca using two vdW radii of the Ca ion, 1.0 Å and 1.7 Å. (A) Direct comparison between the RESP charges. (B) Distribution of the RESP charges. The distribution was obtained using 776 representative EF-hand loop structures . Figure S3. RESP and i_RESP charges of the Ca ion using different vdW radii of the Ca ( σ ) for generation of the ESP data. The structure for EF-hand loop 2 that gives unphysical charges using σ = 1.7Å is used in the calculation. ext 2 Analysis: Radial distribution functions The three protein systems present distinctive preference to Ca binding, therefore, we further studied the radial distribution function g " (𝑟 %($" )* ) as well as the coordination number of water molecules (CN), as shown in Figure S3. g " (𝑟 %($" )* ) describes the distribution of the oxygen atoms in the water molecules surrounding the Ca . CN is the average number of water molecules by integrating g " (𝑟 %($" )* ) over Ca -O separation 𝑟 %($" )* . The positions of the first solvation shells are similar ( 𝑟 %($" )* ~ 3.3 Å) across all the EF-hand loops in all the three systems. However, CN varies. Especially, for the Ca retaining environments, namely Ca /CaM or Ca /CaM/CaMKII, there are on average two water molecules in the first solvation shell, except for the EF-2 in Ca /CaM, where there is only one water molecule. This indicates that in the dynamic environment, the coordination residues in the pentagonal bipyramidal coordination geometry could be dynamic, with another residue being replaced by a second water molecule. For the Ca -releasing environment Ca /CaM/Ng, there are three to six water molecules in the first solvation shell. Ca ions are likely to be partially or completely exposed to the solvent and partially coordinated by the EF-hand loops and the rest are filled by water molecules (Figure S3 and Figure 4(I-L)). This develops a variety of coordination geometries including the pentagonal bipyramidal geometry, octahedral geometry, and others. Figure S3. Radial distribution function between Ca and O atom in water molecules and the coordination number by water. Representative structures are provided accordingly. The CaM is colored in gray, CaMKII peptide is colored in cyan, and Ng peptide is colored in purple. The Ca ions are colored in yellow. The water molecules within the first solvation shell are represented by their oxygen atoms as red beads. Figure S4. Fraction of the number of water molecules in individual Ca binding EF-hands. The distributions were calculated by considering the trajectories from all-atomistic molecular dynamics simulations of Ca /CaM, Ca /CaM/CaMKII, and Ca /CaM/Ng in explicit solvent. The water molecules are counted within the first solvation shell of Ca ion (3.3 Å). Figure S5 Standard deviation of the net charges per residue in the EF-hand loops.
The i_RESP fitted charges are summed for each residue in the loop. Residues chelating the Ca in the crystal structure of holoCaM is colored in blue; the calcium is colored in orange. ext 3 Analysis: Clustering representative structures using molecular features We use neural-net clustering to sample representative structures of calcium in an EF-hand loop including coordinating water molecules. Symmetry functions [1, 2] were used as the molecular feature to remove the translational and rotational degrees of freedom for characterizing the surrounding environment of the Ca composed of the protein(s) and water molecules. (i) The radial distribution function, 𝐺 -./0 (𝑟 ) = ∑ 𝑒 $56𝒓 ) ∙ 𝑓 < =>? (|𝒓 = |; 𝑟 ) (Eqn. S1) where α is the atom category involved in the environment. We divided the atoms into six categories including: Carbon atoms, Nitrogen atoms, Hydrogen atoms, Oxygen atoms from the water molecule(s), Oxygen atoms from the Ca /coordinating sidechains, and other Oxygen atoms. M α is the total number of atoms in category α , 𝒓 = is the displacement between the atom j and the Ca ion. η = 10 -4 Å -2 . Atomic interactions were screened by a cutoff function, 𝑓 (|𝒓 = |; 𝑟 ) = C ?D Ecos I𝜋 . . K L + 1O0, for 𝑟 = > 𝑟 , for 𝑟 = < 𝑟 (Eqn. S2) Using different cutoff radius, 𝑟 ranging from 1.0 Å to 6.0 Å and with 0.5 Å increment, we define 66 radial symmetry function elements. (ii) The angular distribution function, 𝐺 -V/WX = ∑ ∑ Y1 + cos 𝜃 [= 𝑒 $5I6. \ ) ]6. ) ]6. \8 ) L ∙ ; ^ =>?; < [>? 𝑓 (|𝑟 [ |; 𝑟 )𝑓 (|𝑟 = |; 𝑟 )𝑓 (|𝑟 [= |; 𝑟 ) (Eqn. S3) where α , β are the atom categories involved in the environment. 𝜃 [= is the angle i -Ca - j . 𝑟 [= is the displacement between atoms i and j . 𝑟 was set to 3.0 Å for angular distribution function . ext 4 Analysis: Definition of asphericity Δ The shape of the EF-hand loop configurations can be characterized by a rotationally invariant quantity, the asphericity Δ [3]. Δ is determined from the inertia tensor, T αβ 𝑇 -V = ?D‘ ) ∑ (𝑟 [- − 𝑟 =- )(𝑟 [V − 𝑟 =V ) ‘[,=>? (Eqn. S4) where N is the total number of atoms and represent the X, Y, Z components. The eigenvalues of T are denoted by λ i ( i = 1...N). The is the average of λ i . 𝛥 = cD ∑ (d \ $de ) )f\gh (i.j) ) (Eqn. S5) Asphericity ( Δ ) ranges from 0 to 1, and Δ = 0 corresponds to a sphere. Deviation of Δ from 0 indicates the extent of anisotropy [4]. Supplemental Reference
1. Rai, B.K. and G.A. Bakken,
Fast and accurate generation of ab initio quality atomic charges using nonparametric statistical regression.
J Comput Chem, 2013. (19): p. 1661-71. 2. Behler, J., Perspective: Machine learning potentials for atomistic simulations.
J Chem Phys, 2016. (17): p. 170901. 3. Dima, R.I. and D. Thirumalai,
Asymmetry in the shapes of folded and denatured states of proteins.
Journal of Physical Chemistry B, 2004. (21): p. 6564-6570. 4. Homouz, D., et al.,
Crowded, cell-like environment induces shape changes in aspherical protein.
Proceedings of the National Academy of Sciences of the United States of America, 2008. (33): p. 11754-11759. ba ,,