In silico comparison of spike protein-ACE2 binding affinities across species; significance for the possible origin of the SARS-CoV-2 virus
Sakshi Piplani, Puneet Kumar Singh, David A. Winkler, Nikolai Petrovsky
IIn silico comparison of spike protein-ACE2 binding affinities across species; significance for the possible origin of the SARS-CoV-2 virus
Sakshi Piplani , Puneet Kumar Singh , David A. Winkler *, Nikolai Petrovsky * College of Medicine and Public Health, Flinders University, Bedford Park 5046, Australia Vaxine Pty Ltd, 11 Walkley Avenue, Warradale 5046, Australia La Trobe University, Kingsbury Drive, Bundoora 3042, Australia Monash Institute of Pharmaceutical Sciences, Monash University, Parkville 3052, Australia School of Pharmacy, University of Nottingham, Nottingham NG7 2RD. UK CSIRO Data61, Pullenvale 4069, Australia *Joint senior authors bstract
The devastating impact of the COVID-19 pandemic caused by SARS–coronavirus 2 (SARS-CoV-2) has raised important questions on the origins of this virus, the mechanisms of any zoonotic transfer from exotic animals to humans, whether companion animals or those used for commercial purposes can act as reservoirs for infection, and the reasons for the large variations in SARS-CoV-2 susceptibilities across animal species. Traditional lab-based methods will ultimately answer many of these questions but take considerable time. Increasingly powerful in silico modeling methods provide the opportunity to rapidly generate information on newly emerged pathogens to aid countermeasure development and also to predict potential future behaviors. We used an in silico structural homology modeling approach to characterize the SARS-CoV-2 spike protein which predicted its high affinity binding to the human ACE2 receptor. Next we sought to gain insights into the possible origins and transmission path by which SARS-CoV-2 might have crossed to humans by constructing models of the ACE2 receptors of relevant species, and then calculating the binding energy of SARS-CoV-2 spike protein to each of these. Notably, SARS-CoV-2 spike protein had the highest overall binding energy for human ACE2, greater than all the other tested species including bat, the postulated source of the virus. This indicates that SARS-CoV-2 is a highly adapted human pathogen. Of the species studied, the next highest binding affinity after human was pangolin, which is most likely explained by a process of convergent evolution. Binding of SARS-CoV-2 for dog and cat ACE2 was similar to affinity for bat ACE2, all being lower than for human ACE2, and is consistent with only occasional observations of infections of these domestic animals. Snake ACE2 had low affinity for spike protein, making it highly improbable that snakes acted as an intermediate vector. Overall, the data indicates that SARS-CoV-2 is uniquely adapted to infect humans, raising important questions as to whether it arose in nature by a rare chance event or whether its origins might lie elsewhere. ntroduction
The devastating impact of COVID-19 infections caused by SARS–coronavirus 2 (SARS-CoV-2) has stimulated unprecedented international activity to discover effective vaccines and drugs for this and other pathogenic coronaviruses.
It has also raised important questions on the mechanisms of zoonotic transfer of viruses from animals to humans, questions as to whether companion animals or those used for commercial purposes can act as reservoirs for infection, and the reasons for the large variations in SARS-CoV-2 susceptibility across animal species.
Understanding how viruses move between species may help us prevent or minimize these pathways in the future. Elucidating the molecular basis for the different susceptibilities of species may also shed light on the differences in susceptibilities in different sub-groups of humans. Very recently, Shi et al. published the results of experiments to determine the susceptibility to SARS-CoV-2 of ferrets, cats, dogs, and other domesticated animals. They showed that SARS-CoV-2 virus replicates poorly in dogs, pigs, chickens, and ducks, but ferrets and cats are permissive to infection. Other studies have reported the susceptibility of other animal species to SARS-CoV-2.
Susceptible species such as macaques, hamsters and ferrets are used as animal models of SARS-CoV-2 infection.
In the absence of purified, isolated ACE2 from all the relevant animal species that could be used to measure the molecular affinities to spike protein experimentally, computational methods offer considerable promise for determining the rank order of affinities across species, as a method to impute which species may be permissive to SARS-CoV-2. Here we show how computational chemistry methods from structure-based drug design can be used to determine the relative binding affinities of the SARS-CoV-2 spike protein for its receptor, angiotensin converting enzyme (ACE)-2, a critical initiating event for SARS-CoV-2 infection, across multiple common and exotic animal species.
The aim of these studies was to better understand the species-specific nature of this interaction and see if this could help elucidate the origin of SARS-CoV-2 and the mechanisms for its zoonotic transmission. aterials and Methods
Homology modelling of structures
To construct the three-dimensional structure of the SARS-CoV-2 spike protein, the sequence was retrieved from NCBI Genbank Database (accession number YP_009724390.1). A PSI-BLAST search against the PDB Database for template selection was performed and the X-ray structure of SARS coronavirus spike template (refcode 6ACC) was selected with 76.4% sequence similarity to SARS-CoV-2 spike protein. The protein sequences of the ACE2 proteins for different species is summarized in Table 11 and full sequence alignment in Supplementary Figure 1. The phylogenetic tree for ACE2 proteins from selected animal species is illustrated in Supplementary Figure 2. The 3D-structures were built using Modeller 9.21 (https://salilab.org/modeller/). The quality of the generated models was evaluated using the GA341 score and DOPE scores, and the models assessed using
SWISS - MODEL structure assessment server (https://swissmodel.expasy.org/assess). Protein Structure preparation Molecular Docking
These modelled structures were docked against SARS-CoV-2 spike protein structure using the HDOCK server (http://hdock.phys.hust.edu.cn/).
Molecular docking was performed on the homology modelled SARS-CoV-2 spike protein with human and animal ACE2 proteins. The SARS-CoV spike protein was also docked with human ACE2 protein to obtain the docking pose for binding energy calculations. The docking poses were ranked using an energy-based scoring function. The docked structures were analyzed using UCSF Chimera software.
Molecular Dynamics Simulation Docked complexes (SARS-CoV-2 spike with human ACE2, human ARS-CoV spike with human ACE2, SARS-CoV-2 spike with bat ACE2 etc) were used as starting geometries for MD simulations. Simulations were carried out using the GPU accelerated version of the program with the AMBER99SB-ILDN force field I periodic boundary conditions on an Oracle Cloud Server. Docked complexes were immersed in a truncated octahedron box of TIP3P water molecules. The solvated box was further neutralized with Na+ or Cl− counter ions using the tleap program. Particle Mesh Ewald (PME) was employed to calculate the long-range electrostatic interactions. The cutoff distance for the long-range van der Waals (VDW) energy term was 12.0 Å. The whole system was minimized without any restraint. The above steps applied 2500 cycles of steepest descent minimization followed by 5000 cycles of conjugate gradient minimization. After system optimization, the MD simulations was initiated by gradually heating each system in the NVT ensemble from 0 to 300 K for 50 ps using a Langevin thermostat with a coupling coefficient of 1.0/ps and with a force constant of 2.0 kcal/mol·Å2 on the complex. Finally, a production run of 100 ns of MD simulation was performed under a constant temperature of 300 K in the NPT ensemble with periodic boundary conditions for each system. During the MD procedure, the SHAKE algorithm was applied for the constraint of all covalent bonds involving hydrogen atoms. The time step was set to 2 fs. The structural stability of the complex was monitored by the RMSD and RMSF values of the backbone atoms of the entire protein. Finally, the free energies of binding were calculated for all the simulated docked structures. Calculations were also performed for up to 500 ns on human ACE2 to ensure that 100ns is sufficiently long for convergence. Duplicate production runs starting with different random seeds were also run to allow estimates of binding energy uncertainties to be determined for the strongest binding ACE2 structures.
MM‐PBSA binding free energy study
The binding free energies of the protein‐protein complexes were evaluated in two ways. The traditional method is to calculate the energies of solvated SARS-CoV-2 spike and ACE2 proteins and that of the bound complex proteins and derive the binding energy by subtraction. ΔG (binding, aq) = ΔG (complex, aq) – (ΔG (spike, aq) + ΔG (ACE2, aq) (1) e also calculated binding energies using the molecular mechanics Poisson Boltzmann surface area (MM-PBSA) tool in G
ROMACS that is derived from the nonbonded interaction energies of the complex.
The method is also widely used method for binding free energy calculations. The binding free energies of the protein complexes were analyzed during equilibrium phase from the output files of 100 ns MD simulations. The g_mmpbsa tool in G
ROMACS was used after molecular dynamics simulations, the output files obtained were used to post-process binding free energies by the single-trajectory MM-PBSA method. Specifically, for a non-covalent binding interaction in the aqueous phase the binding free energy, ΔG (bind,aq), is: – ΔG (bind,aqu) = ΔG (bind,vac) + ΔG (bind,solv) (2) where ΔG (bind,vac) is the binding free energy in vacuum, and ΔG(bind,solv) is the solvation free energy change upon binding: – ΔG (bind,solv) = ΔG (R:L, solv) - ΔG (R,solv) - ΔG (L,solv) (3) where ΔG (R:L,solv), ΔG (R,solv) and ΔG (L,solv) are solvation free energies of complex, receptor and ligand, respectively. Free energy decomposition analyses were also performed by MM-PBSA decomposition to get a detailed insight into the interactions between the ligand and each residue in the binding site. The binding interaction of each ligand–residue pair includes three terms: the van der Waals contribution, the electrostatic contribution, and the solvation contribution.
Interaction energy
Another estimate of the strength of the interaction between protein-protein complex can be obtained from the non-bonded interaction energy between the complex. G
ROMACS has the ability to decompose the short-range nonbonded energies between any number of defined groups. To compute the interaction energies as a part of our analysis, we reran the trajectory files obtained during simulation to recompute energies using -rerun command. The interaction energy is the ombination of short range Coulombic interaction energy (Coul-SR:Protein-Protein) and the short-range Lennard-Jones energy (LJ-SR:Protein-Protein (see Table 3). While this paper was being prepared, a paper by Guterres and Im described a substantial improvement in protein-ligand docking results using high-throughput MD simulations. They employed docking using AutoDock Vina, followed by MD simulation using CHARMM. The parameters they advocated were very similar to those used in our study. Proteins were solvated in a box of TIP3P water molecules extending 10 Å beyond the proteins and the particle-mesh Ewald method was used for electrostatic interactions. Nonbonded interactions over 10 and 12 Å were truncated. Their systems were minimized for 5000 steps using the steepest descent method followed by 1 ns equilibration with an NVT setting. For each protein-ligand complex, they ran 3 × 100 ns production runs from the same initial structure using different initial velocity random seeds and an integration step size of 2 fs.
Results
The ancestry of SARS-CoV-2 traces back to the human, civet and bat SARS-CoV strains, which all use the same ACE2 proteins for cellular entry.
The similarities and variations in sequences for both the SARS-CoV-2 spike protein and the SARS spike protein were determined from sequences retrieved from NCBI GenBank Databank and aligned using CLUSTALW. The spike protein receptor binding domain (RBD) region showed a 72% identity between the two viruses (Figure 1). igure 1 . SARS and SARS-CoV-2 receptor binding domain sequence alignment showing conserved regions in black and non-conserved residues in red and blue.
Spike Protein-ACE2 interaction
The three-dimensional structure of SARS-CoV-2 Spike protein (Figure 2) used was PDB refcode 6VXX. ACE2 is the receptor for SARS-CoV-2 and is needed for its cellular entry. The low affinity binding of SARS spike protein for mouse ACE2 has been postulated as the reason mice are largely non-permissive to SARS infection. Similarly, we postulated that variation in ACE2 between species may determine the affinity of binding of SARS-CoV-2 Spike protein and hence determine which species are permissive to SARS-CoV-2 infection. The sequence alignment of the selected ACE2 species is shown in Supplementary Figure 1 and the phylogenetic tree showing relatedness of ACE2 proteins across selected animal species is illustrated in Supplementary Figure 2. ACE2 modelled structures were generated for each relevant species and the ones with the lowest DOPE (Discrete Optimized Protein Energy) score were chosen and refined by energy minimization and used for further analysis.
Figure 2 . 3D structure of SARS-CoV-2 spike protein (PDB refcode 6VXX) The Ramachandran scores of the modeled ACE2 structures for selected species are summarized in Table 1 with the actual predicted ACE2 structures and Ramachandran plots for each selected species shown in Supplementary Figure 3.
Table 1 . Ramachandran scores for ACE2 modelled structures for selected species
Species MolProbity Score Ramachandran Score (favoured region)
Rhinolophus sinicus (bat)
Mus musculus (mouse)
Mustela putorius furo (ferret)
Mesocricetus auratus (hamster)
Felis catus (cat)
Paguma larvata (civet)
Macaca fascicularis (monkey)
Manis javanica (pangolin)
Ophiophagus hannah (king cobra)
Canis lupus familiaris (dog)
Equus caballus (horse)
Panthera tigris altaica (tiger)
Bos Taurus (cattle)
The three-dimensional structures of ACE2 receptor of selected species were created using Modeller9.23. The templates used to model the various ACE2 structures were 1R42 (human ACE2), 3CSI (glutathione transferase) and 3D0G (spike protein receptor-binding domain from the 2002-2003 SARS coronavirus human strain) (Table 2). Similarity of the template and query is important while building the model, the query sequence of
Macaca fascicularis (monkey) accession number A0A2K5X283 was found to be 96.9% similar to the structure of human ACE2 (PDB id: 1R42) whereas
Ophiophagus hannah (king cobra) had a much lower similarity of 61.42% to its template, 1R42 (Table 2). All structures were subjected to energy minimization and those with the lowest DOPE score energy minimized structures were selected for further study (Supplementary Figure 3).
Table 2 . Template structures used to model selected ACE2 species and similarity scores of each ACE2 sequence to the selected template used
Species Accession No. Template Similarity
Rhinolophus sinicus (Bat) AGZ48803.1 3CSI 79.73%
Mus musculus (Mouse) Q8R0I0 1R42 84.27%
Mustela putorius furo (Ferret) Q2WG88 1R42 83.44%
Mesocricetus auratus (Hamster) A0A1U7QTA1 1R42 87.58%
Felis catus (Cat) Q56H28 1R42 85.93%
Canis luparis (Dog) J9P7Y2 1R42 84.93%
Paguma larvata (Ccivet) Q56NL1 3D0G 86.77%
Macaca fascicularis (Monkey) A0A2K5X283 1R42 96.91%
Manis javanica (Pangolin) XP_017505752.1 1R42 85.57%
Ophiophagus hannah (King cobra)
V8NIH2 1R42 61.42%
Equus caballus (Horse)
F6V9L3
Panthera tigris altaica (Tiger)
XP_007090142.1
Bos taurus (Cow)
NP_001019673.2
Structure Quality Assessment
The modelled structures were further assessed for quality control using Ramachandran Plot and molprobity scores in SWISSModel structure assessment. The Ramachandran Plot checks the stereochemical quality of a protein by analyzing residue-by-residue geometry and overall structure geometry, is also a way to visualize energetically allowed regions for backbone dihedral angles ψ against φ of amino acid residues in protein structure. The Ramachandran score of SARS-CoV-2 spike protein was 90% in the binding region and molprobity scores that provide an evaluation of model quality at both the global and local level was 3.17. Further, the ACE2 modelled structures from selected species were also assessed using Swiss model structure assessment. Ramachandran score or the percentage of amino acid residues falling into the energetically favored region for all species ranged from 96-99% (Table 1). These Ramachandran nd Molprobity scores show that all the built structures were of good quality and were suitable for use in further studies. The Ramachandran graphs are presented in Supplementary Figure 3.
Molecular Docking analysis
The Receptor Binding Domain (RBD) of SARS-CoV-2 spike protein was docked against ACE2 receptor of various species using HDOCK server. The interacting residues of ACE2 and SARS-CoV-2 spike protein are depicted in Table 3. We found certain key amino acids in the receptor binding motif (RBM) that were in accordance with previous studies . Certain amino acids including PHE28, ASN330, ASP355 and ARG357 were conserved in ACE2 of most of the selected species and were observed to take part in the interaction with spike protein. TYR41, LYS353, ALA386 and ARG393 also interacted with spike protein residues and were highly conserved across all species except bat, mouse, ferret and pangolin, respectively. Spike protein interacting residues with Ophiophagus hannah (King cobra) were least common amongst all the ACE2 species included in the study, consistent with its low sequence similarity to human ACE2. able 3 . Protein interacting residues in selected ACE2 species when docked with SARS-CoV-2 spike protein. Residues common to spike interacting residues in human ACE2 are labelled red.
Species Accession Number Position 19 24 27 28 30 31 34 37 38 41 42 79 83 330 353 393
Homo sapiens (Human)
Q9BYF1 S Q T F D K H E D Y Q L Y N K R
Macaca fascicularis (Monkey)
A0A2K5X283 S Q T F D K H E D Y Q L Y N K R
Rhinolophus sinicus (Bat)
U5WHY8
S E M F D K T E D H Q L Y N K R
Mustela putorius furo (Ferret)
Q2WG88 D L T F E K T E E Y Q - Y N K R
Manis javanica (Pangolin)
XP_017505752.1 - E T F E K S E E Y Q I Y N K R
Ophiophagus hannah (Snake)
ETE61880.1 Q V K F E Q A - D Y N N F N L R
Canis luparis (Dog)
J9P7Y2 - L T F E K Y E E Y Q L Y N K R
Mesocricetus auratus (Hamster)
A0A1U7QTA1 S Q T F D L Q E D Y Q L Y N K R
Panthera tigris (Tiger)
XP_007090142.1 S L T F D K H E E Y Q L Y K K R
Bos Taurus (Cow)
NP_001019673.2 S Q T F E K H E D Y Q M Y N K R
Mus musculus (Mouse)
Q8R0I0 S N T F N N Q E D Y Q Y F N K R
Paguma larvata (Civet)
Q56NL1 S L T F E K Y E Q Y Q L Y N K R
Felis catus (Cat)
Q56H28 S L T F E K H E E Y Q L Y N K R
Equus ferus caballus(Horse)
F6V9L3 S L T F D K S E E H Q L Y N K R olecular dynamics simulations
The molecular dynamics simulation of complexes of SARS-CoV-2 spike protein and ACE2 receptors of various species were performed for 100ns. All complexes became stable during simulation with RMSD fluctuations converging to a range of 0.5 to 0.8 nm from the original position. The calculated binding energies for the interactions of SARS-CoV-2 with ACE2 from the species studied are presented in Table 4. The MMPBSA binding energies are summarized for comparison. The energies calculated by equation 1 and those from the MMPBSA algorithm correlate strongly (r =0.92). The binding energies (Table 2) correlate poorly with sequence similarity (r =0.27). Also shown is the observed and predicted SARS-CoV-2 susceptibilities of the species from analysis of phylogenetic clustering and sequence alignment with currently known ACE2s utilized by SARS-CoV-2, as described by Qiu et al. (Supplementary Figure 2) Also listed is the interaction energies of the spike and ACE2 proteins for each species calculated by Wu et al. using an automatic docking method, ICM-Pro . Table 4 also includes observational in vivo data on SARS-CoV-2 infectivity and disease symptoms in the species where this has been reported. able 4. Binding energies of SARS-CoV-2 spike protein to ACE2 of selected species and potential species susceptibilities from other studies
Species Binding energy kcal/mol MMPBSA energy kcal/mol COVID infectivity n/a = not assessed
Homo sapiens (human) -52.8 -57.6 Permissive, high infectivity, severe disease in 5-10%,
Manis javanica (pangolin) -52.0 -56.5 Permissive
Canis luparis (dog) -50.8 -49.5 Permissive, low infectivity, no overt disease
Macaca fascicularis (monkey) -50.4 -50.8 Permissive, medium infectivity, lung disease
Mesocricetus auratus (hamster) -49.7 -50.0 Permissive, high infectivity, lung disease
Mustela putorius furo (ferret) -48.6 -49.2 Permissive, medium infectivity, mild disease
Felis catus (cat) -47.6 -48.9 Permissive, high infectivity, lung disease
Panthera tigris (tiger) -47.3 -42.5 Permissive, overt respiratory symptoms
Rhinolophus sinicus (bat) -46.9 -49.6 n/a
Paguma larvata (civet) -45.1 -46.1 n/a
Equus ferus caballus (horse) -44.1 -49.2 Permissive, low infectivity, no overt disease
Bos taurus (cow) -43.6 -42.5 n/a
Ophiophagus hannah (snake) -39.5 -52.5 n/a
Mus musculus (mouse) -38.8 -39.4 n/a iscussion
Knowing which species may be permissive to SARS-CoV-2 is very important, both in respect of identifying intermediate hosts and potential source of the original virus, as well as helping to identify suitable species for use as infection models to allow testing of COVID-19 drugs and vaccines. Directly testing broad ranges of species for virus susceptibility is difficult and time consuming and in the cases of very rare species, may not even be practicable. A less direct method to try and obtain the same information might involve measurement of binding affinity of the SARS-CoV-2 spike protein to the target ACE2 receptor from different species. This can be done, for example, through use of cell lines transfected with ACE2 receptors from each individual species, but this would again be time consuming and not practicable. A third approach, adopted here, is to use fast, efficient in silico structural modelling and docking algorithms, using available genomic and structural biology data, to generate relevant ACE2 structural models and use molecular dynamics to calculate the binding energies. Notably, this approach surprisingly revealed that the binding energy between SARS-CoV-2 spike protein and ACE2 was highest for humans out of all species tested, suggesting that SARS-CoV-2 spike protein is uniquely evolved to bind and infect cells expressing human ACE2. This finding is particularly surprising as, typically, a virus would be expected to have highest affinity for the receptor in its original host species, e.g. bat, with a lower initial binding affinity for the receptor of any new host, e.g. humans. However, in this case, the affinity of SARS-CoV-2 is higher for humans than for the putative original host species, bats, or for any potential intermediary host species. The calculated binding energies of SAR-Cov2 spike protein with ACE2 from selected species are in general agreement with the relatively limited amount of published information. SAR-Cov2 binding affinity to monkey ACE2 was lower than for human ACE2. Young cynomolgus macaques when infected expressed viral RNA in nasal swabs but did not develop overt clinical symptoms whereas aged animals showed higher viral RNA loads and some weight loss and rapid respiration associated with moderate interstitial pneumonia and virus replication in upper and lower respiratory tract.
Syrian hamsters and ferrets are highly susceptible to SARS-CoV infection and are used as animal models of the disease. Mahdy published a preprint reviewing SARS-CoV-2 infection in animals and suggested that SARS-CoV-2 might recognize ACE2 from a variety of animal species, including palm civet, the intermediate host for SARS-CoV. Although bats carry many coronaviruses including SARS-CoV, a relative of SARS-CoV-2, direct evidence for existence of SARS-CoV-2 in bats has not been found. As highlighted by our data, the binding strength of SARS-CoV-2 for bat ACE2 is considerably lower than for human ACE2, uggesting that even if SARS-CoV-2 did originally arise from a bat precursor it must later have adapted its spike protein to optimise its binding to human ACE2. There is no current explanation for how, when or where this might have happened. Instances of direct human infection by coronaviruses or other bat viruses is rare with transmission typically involving an intermediate host. For example, lyssaviruses such as Hendra are periodically transmitted from bats to horses and then to humans who contact the infected horse. Similarly, SARS-CoV was shown to be transmitted from bats to civet cats and from them to humans. To date, a virus identical to SARS-CoV-2 has not been identified in bats or any other non-human species, making its origins unclear. To date, the most closely related coronavirus to SARS-CoV-2, is the bat coronavirus, BatCoV RaTG1, which has 96% whole-genome identity to SARS-CoV-2.
The fact that SARS-CoV-2 has also not been found in any likely intermediate host raises questions of the origins of the original SARS-CoV-2 virus that infected human case zero in late 2019. Wuhan, the epicentre of the outbreak, hosts China’s only BSL4 facility and is the site of considerable bat coronavirus research. Identification of an intermediate animal host in which SARS-CoV-2 might have adapted to a human ACE2 permissive form would go a long way to alleviating concerns that SARS-CoV-2 is not a natural virus. Lam et al. made confused public claims of finding SARS-CoV-2 in Malayan pangolins, suggesting that pangolins were an intermediate vector for SARS-CoV-2. However, further sequence analysis of these claims by Zhang et al. established that Pangolin-CoV was a very different coronavirus that had modest at best ~90% sequence similarity to SARS-CoV-2. While Pangolin-CoV spike RBD shared some similarities to SARS-CoV-2, its spike protein did not share the furin cleavage site that was a prominent feature of SARS-CoV-2 spike protein. Hence, any similarity of Pangolin-CoV to SARS-CoV-2 was restricted to the residues in the RBD and RBM. Overall, Pangolin-CoV is only a distant relative of SARS-CoV-2. Based on our data, the similarity of Pangolin-CoV to SARS-CoV-2 in the spike RBM could be a case of convergent evolution of the two viruses, whereby the close similarity of the structure of the pangolin ACE2 spike binding domain (SBD) to the same region of human ACE2, drove the convergent evolution of the spike protein RBD of both viruses, allowing them to bind to pangolin and human ACE2, respectively. Such close similarity of pangolin and human ACE2 SBD could make it easy for any pangolin CoV to cross from pangolins to humans. Nevertheless, with no virus matching the SARS-CoV-2 sequence identified in pangolins this makes it less likely that SARS-CoV-2 has a pangolin origin. However, this does call for more intensive survey of coronaviruses in pangolin populations, should such viruses pose future human pandemic threats. This finding also supports the need for strict enforcement of a worldwide ban on any trafficking of pangolins to reduce risks of pangolin coronaviruses crossing to humans and becoming a trigger for a future oronavirus pandemic. However, there continues to be a lack of evidence to indicate that SARS-CoV-2 is a pangolin-derived virus that first crossed from pangolins to humans in late 2019. Early in the COVID-19 outbreak it was suggested that snakes may be an intermediate vector. ACE2 of turtle and snake has very low homology to human ACE2 and SARS-CoV-2 was shown to not bind to reptile ACE2, making snakes unlikely as possible hosts for SARS-CoV-2. This is consistent with our own data predicting moderate to low binding of SARS-CoV-2 to snake ACE2. Rodents varied widely in their permissiveness to SARS-CoV with mice being resistant and hamsters permissive. Similarly there is inefficient virus replication of SARS-CoV-2 in mice making them unsuitable as models to test SARS or COVID-19 vaccines or drugs. This reflects the low predicted binding energy of SARS-CoV-2 spike protein for mouse ACE2, as demonstrated by our model data. Mice only became permissive for SARS infection when made transgenic for human ACE2, with the same likely to be true for SARS-CoV-2. By contrast, hamsters were highly permissive for SARS-CoV infection. Our model predicted that hamster should be permissive to SARS-CoV-2 infection based on the predicted strong binding of SARS-Cov-2 spike protein for hamster ACE2 (Table 4). Consistent with our model data, Syrian hamsters have been shown to exhibit clinical and histopathological responses to SARS-CoV-2 that closely mimic human upper and lower respiratory tract infections, with high virus shedding and ability to transmit to naïve contact animals. Ferrets were another permissive model of SARS CoV infection, and our modelling data indicated that SARS-CoV-2 has a similar binding energy to ferret, as it does to hamster, ACE2 (Table 4). Consistent with our model data, ferrets have been shown to be permissive to infection with SARS-CoV-2, with high virus titre in the upper respiratory tract, virus shedding, infected ferrets showing acute bronchiolitis but without severe disease or death, and active transmission to naïve ferrets through direct contact.
Cat and tiger ACE2 were shown by our model to have similar binding affinity for SARS-CoV-2 spike protein and both these species have been shown to be permissive for SARS-CoV-2 infection. Similarly our data suggests that SARS-CoV-2 binds with moderate affinity to dog ACE2 suggesting that dogs may be susceptible to infection by SARS-CoV-2. In the case of companion animals that live in close proximity to humans, Shen at al. state that SARS-CoV-2 can be efficiently transmitted in cats and dogs while Shi et al. found that SARS-CoV-2 replicates poorly in dogs, pigs, chickens, and ducks, but that ferrets and cats were permissive to infection. Temmam et al. tested 9 cats and 12 dogs living in close contact with their owners, two of whom tested positive for SARS-CoV-2 and 11 of 18 others showed clinical signs of COVID-19 but no antibodies against SARS-CoV-2 were detectable in their blood using an immunoprecipitation assay. Goumeniu et al. published an ditorial querying the role of dogs in the Lombardy COVID-19 outbreak and recommended use of computational docking experiments to provide evidence for or against infection of dogs. Hence, our model data suggesting that dog ACE2 might be permissive for SARS-CoV-2 binding and infection is therefore consistent with anecdotal reports of dogs being infected with SARS-CoV-2. Hence other genetic factors could underlie the apparent lack of susceptibility of dogs to COVID-19 clinical infection. It is known that gain of function (GOF) mutations occur in viruses that can lead to pandemics. GOF means viruses gain a new property e.g. in influenza virus GOF has been associated with the acquisition of a new function, such as mammalian transmissibility, increased virulence for humans, or evasion of existing host immunity. The conditioning of viruses to humans as pandemics progress is well recognized. However, the SARS-CoV-2 structures and sequences that we employed were from viruses collected very early in the pandemic. It is therefore not clear how SARS-CoV-2 could have developed such a high affinity for human ACE2, notably higher than for those of putative zoonotic sources for SARS-CoV-2, unless it has been previously selected on human ACE2 or an ACE2 of another species bearing a closely homologous spike protein binding domain. Interestingly, pangolin ACE2 bears some similarities in its SBD to human ACE2. This marries with the fact that Pangolin-CoV shares a highly similar RBD to SARS-CoV-2, although their remaining sequence has only 90% similarity. This could be consistent with a process of convergent evolution whereby human and pangolin coronaviruses infecting via ACE2, have come to the same solution in respect of evolving an optimal spike RBD for binding of either human or pangolin ACE2, respectively. Our data does indicate that humans might be permissive to pangolin CoVs that use ACE2 for cell entry, a fact that needs to be borne in mind in respect of future potential coronavirus pandemic sources. However, this does not mean that pangolin ACE2 was the receptor on which the SARS-CoV-2 spike protein RBD was initially selected, with the strength of binding to pangolin ACE2 lower than binding to human ACE2. This makes it unlikely that pangolins are the missing intermediate host. If SARS-CoV-2 spike was selected on pangolin ACE2, then given the higher affinity of SARS-CoV-2 for human ACE2 than for bat ACE2, SARS-CoV-2 would have to have circulated in pangolins for a long period of time for this evolution and selection to occur and to date there is no evidence of a SARS-CoV-2 like virus circulating in pangolins. Another possibility would be a short term evolutionary step where a pangolin was recently co-infected with a bat ancestor to SARS-CoV-2 at the same time as it was infected by a pangolin CoV allowing a recombination event to occur whereby the spike RBD of the pangolin virus was inserted into the bat CoV, thereby conferring the bat CoV with high binding for both pangolin and human CE2. Such recombination events are known to occur with other RNA viruses and can explain creation of some pandemic influenza strains . Nevertheless, such events are by necessity rare as they require coinfection of the one host at exactly the same time. Most importantly, if such a recombination event had occurred in pangolins it might have been expected to have similarly triggered an epidemic spread of the new highly permissive SARS-CoV-2 like virus among pangolin populations, such as we now see occurring across the human population. Currently there is no evidence of such a pangolin SARS-CoV-2 like outbreak, making this whole scenario less likely. Indeed, pangolins might be protected from SARS-CoV-2 infection due to the existence of cross-protective spike RBD neutralising antibodies induced by exposure to pangolin CoV, given the RBD similarity of these two viruses. Another possibility which still cannot be excluded is that SARS-CoV-2 was created by a recombination event that occurred inadvertently or consciously in a laboratory handling coronaviruses, with the new virus then accidentally released into the local human population. Given the seriousness of the ongoing SARS-CoV-2 pandemic, it is imperative that all efforts be made to identify the original source of the SARS-CoV-2 virus. In particular, it will be important to establish whether COVID-19 is due to a completely natural chance occurrence where a presumed bat virus was transmitted to humans via an intermediate animal host or whether COVID-19 has alternative origins. This information will be of paramount importance to help prevent any similar human coronavirus outbreak in the future. Acknowledgements
We would like to thank Harinda Rajapaksha for assistance to optimise Gromacs for this project. We would like to thank Oracle Corporation for providing their Cloud computing resources through the Oracle for Research program for the modelling studies described herein. In particular, we wish to thank Peter Winn, Dennis Ward, and Alison Derbenwick Miller for facilitating this access. S.P., P.K.S and N.P. are supported by the National Institute of Allergy and Infectious Disease of the National Institutes of Health under Contracts HHSN272201400053C and HHSN272201800044C. This publication’s contents are solely the responsibility of the authors and do not necessarily represent the official views of their affiliated institutions, funding bodies or Oracle Corporation. eferences
UPPLEMENTARY INFORMATION
Supplementary Figure 1. Sequence alignment of ACE2 amino acid sequence from selected species. The SAR-Cov-2 spike protein binding region is highlighted in red.
Supplementary Figure 2. Phylogenetic tree showing relatedness of sequences of ACE2 proteins from selected species.
AT ACE2
CAT ACE2
IVET ACE2
DOG ACE2
ERRET ACE2
HAMSTER ACE2
MONKEY ACE2
MOUSE ACE2
PANGOLIN ACE2
SNAKE ACE2
ORSE ACE2
CATTLE ACE2
IGER ACE2
Supplementary Figure 3. Modelled ACE2 structures for selected species, with Ramachandran plots and quality metrics
Supplementary Figure 4.
Predicted and confirmed utilization of ACE2 receptors by the SARS-Cov-2 spike protein based on sequence homology from Qui et al.41