Featuring ACE2 binding SARS-CoV and SARS-CoV-2 through a conserved evolutionary pattern of amino acid residues
FFeaturing ACE2 binding SARS-CoV and SARS-CoV-2 through a conservedevolutionary pattern of amino acid residues
Patrícia P. D. Carvalho ∗ and Nelson A. Alves † Departamento de Física, FFCLRP, Universidade de São Paulo,Avenida Bandeirantes, 3900. Ribeirão Preto 14040-901, SP, Brazil.
Coronaviruses attach to host cell surface receptors via their spike (S) proteins to mediate the entryinto the host cell. The S1 coronavirus subunit of S-proteins contains the receptor-binding domain(RBD) that is able to recognize different host receptors, highlighting its remarkable capacity toadapt to their hosts along the viral evolution. While RBD in spike proteins is determinant forthe virus-receptor interaction, the active residues lie at the receptor-binding motif (RBM), which ispart of RBD and plays a fundamental role binding the outer surface of their receptors. Here, weaddress the hypothesis that SARS-CoV and SARS-CoV-2 strains able to use angiotensin-convertingenzyme 2 (ACE2) proteins have adapted their RBM along the viral evolution to explore specificconformational topology driven by the amino acid residues YGF to infect host cells. We alsospeculate that this YGF-based mechanism can act as a protein signature located at the RBM todistinguish coronaviruses able to use ACE2 as a cell entry receptor.
INTRODUCTION
Viruses are the most numerous type of biological entityon Earth and the identification of novel viruses contin-ues to enlarge the known viral biosphere [1, 2]. Thiscollection of all viruses presents enormous morphologicaland genomic diversity as a result of continuous exchangeof genetic material with the host cells [3, 4]. Moreover,this well succeeded long-term virus-host interaction in-dicates that viruses are more than simple genomic para-sites in all cellular life forms [5]. A number of evidenceshas led to the proposal that viruses play an astonish-ing role as agents of evolution because of their capacityin propagating between biomes [6] and in gene transferbetween species [7–10]. For this purpose, viruses havedeveloped large number of genome replication and pro-tein expression strategies to benefit from the host trans-lational machinary over time [11].Despite all of such enormous diversity in gene sequence,it is not possible to achieve huge number of highly distinctprotein structures mainly because of stereochemical con-straints on the possible protein folds [12]. In fact, it hasbeen observed common secondary structures throughoutdifferent virus families while the sequences are not fullyconserved [12, 13]. This may result in evolutionary ef-ficiency once viruses can exploit already well designedmotifs from similar cellular functions [11].Currently, the world population is confronting a newcoronavirus disease (COVID-19), a highly infectious dis-ease to humans. This disease is caused by severe acuterespiratory syndrome coronavirus 2 (SARS-CoV-2) andis affecting human health worldwide. Coronaviruses(CoVs) belong to the large and diverse family
Coro-naviridae , within the order
Nidovirales and suborder
Cornidovirineae [14]. Their subfamily
Orthocoronaviri- ∗ [email protected] † alves@ffclrp.usp.br nae contains four genera based on phylogeny and termedas α , β , γ , and δ - coronavirus .SARS-CoV-2 belongs to the β - coronavirus genus aswell as SARS-CoV, middle east respiratory syndromecoronavirus (MERS-CoV), and hCoV-HKU1, to citea few [15]. Other important representative virusesas human hCoV-NL63 and hCoV-229E belong to α - coronavirus . Phylogenetic relationships among theknown members of this subfamily indicate that α and β - coronavirus infect mammals, while γ and δ - coronavirus infect both mammalians and avians.Members of Coronaviridae family are enveloped, pos-itive single-stranded RNA (+ssRNA) viruses and ren-der the largest genomes among all known RNA viruses[16–19]. The +ssRNA genomes undergo rapid muta-tional changes [20], leading to faster adaptation to newhosts, though also contain conserved sequence motifs asobserved, for example, in multiple alignments do CoVstrains [13, 21, 22].Coronaviruses attach to host cell surface receptors viatheir spike (S) glycoproteins, located on the viral en-velope, to mediate the entry into the host cell. Eachmonomer of trimeric S-protein comprises two subunitsS1 and S2, responsible for the viral attachment and forthe membrane fusion, respectively [23–25]. The S1 coro-navirus subunit contains the receptor-binding domain(RBD) that is able to recognize different host receptors,highlighting its remarkable capacity to adapt to theirhosts along the viral evolution. Thus, it is not unex-pected to observe in this domain high sequence diver-gence even for the same coronavirus identified in differ-ent host species. In contrast, the S2 subunit presents themost conserved region in the S-protein.The binding of RBD spike proteins to the receptor onthe host cell is the first step in virus infection. This ini-tial step is followed by an entry mechanism of envelopedviruses into target cells. Usually, most viruses enter cellsthrough endocytotic pathways with the fusion occurringin the endosomes, although a direct entry into cells can a r X i v : . [ q - b i o . B M ] A ug occur by fusion of their envelopes with the cell membrane[26].A number of CoVs utilizes angiotensin-converting en-zyme 2 (ACE2) as the entry receptor into cells, exempli-fied by β -genus human respiratory SARS-CoV, SARS-CoV-2, and α -genus hCoV-NL63 [15, 27–29]. In par-ticular, SARS-CoV, as well as SARS-CoV-2, enter thecell via endocytosis induced by RBD complexed with hu-man ACE2 (hACE2) receptor [30–34]. In contrast, the β -genus MERS-CoV and its genetically related bat CoV-HKU4 utilize dipeptidyl peptidase 4 (DPP4) as the viralreceptor [35]. Other viral receptor is aminopeptidase N(APN), recognized for example by the α -genus hCoV-229E [36].The human coronaviruses hCoV-HKU1, hCoV-229E,hCoV-NL63, and hCoV-OC43, cause mild to moder-ate upper respiratory tract infections [37], while SARS-CoV and SARS-CoV-2 cause severe respiratory diseases,with SARS-CoV-2 being far more lethal than SARS-CoV.SARS-CoV strains vary enormously in infectivity, whichcan be connected to their binding affinities to hACE2[38]. This binding affinity, in turn, can be correlatedwith disease severity in humans [39].While RBD in spike proteins is determinant for thevirus-receptor interaction, the active residues lie at thereceptor-binding motif (RBM), which is part of RBDand plays a fundamental role binding the outer surfaceof their receptors [27, 28, 38, 40, 41]. The importanceof the RBM is further explored here in relation to itsstructural topology. Thus, instead of analysing specificresidues that make contacts with ACE2 after binding,we followed the molecular origin that drives the viral at-tachment to this cell receptor. This investigation hasrevealed a highly conserved amino acid residue sequenceTyr-Gly-Phe (YGF) in coronavirus variants that employthis receptor. Thus, we hypothesize that the short se-quence YGF is vital for RBD-ACE2 interaction becauseof the formation of the hydrophobic pocket proper to thereceptor specificity [40, 42–44], as exposed next. It islikely that SARS-CoV and SARS-CoV-2 strains able touse ACE2 proteins have adapted their RBM along theviral evolution to explore this YGF-based mechanism toinfect host cells. The conserved XGF loop in UBA-ubiquitininteraction
Amino acid sequences of type XGF, where the residueX is frequently the residue Met, form a highly conservedloop characteristic of ubiquitin-associated (UBA) domainthat occurs in a variety of proteins. The UBA do-main is a conserved motif through eukaryotic evolutionand is found in many proteins related to the ubiquitinmetabolism and in particular, associated with ubiquitin-mediated proteolysis [45, 46]. The MGF loop in the UBAdomain is typical of a hydrophobic pocket that is criticalfor recognition and binding affinity to ubiquitin through a hydrophobic surface patch located in the vicinity of thisloop [47–54].NMR analyses of UBA-ubiquitin interactions iden-tify hydrophobic surface patches formed by the con-served MGF sequence as the main determinants forthe dimerization interface. A number of alignmentsof p62/SQSTM1 UBA domain with other UBA-domainproteins has revealed single-point mutation in the MGFsequence, mainly M → L, or F → Y, maintaining its overallhydrophobic characteristic [55].
RESULTS AND DISCUSSIONSpike receptor-binding motifs in human CoVs
Here, we examine the occurrence and importance of thespecific amino acid residue sequence YGF for SARS-CoVand SARS-CoV-2 strains able to use ACE2 proteins asreceptors. It is displayed in Fig 1a the interface of SARS-CoV RBD spike-protein (magenta and green color) com-plexed with hACE2 (blue color) to gain insight aboutthe importance of this kind of conformational mecha-nism in creating a shape complementarity between re-ceptor and ligand. The RBM is in magenta color, withthe yellow color displaying the YGFY sequence in thatpocket, which establishes the proper relative position forfavorable binding to ACE2 exposed residues. The YGFYsequence seems strongly conserved for many SARS-CoV.It is located at residues 481-484 in RBM. Noteworthy,no other GF sequence occurs in this region, neither inits RBD. As a consequence of this hydrophobic pocket,amino acid residues responsible for binding interactionare located close to this conformational structure. Forinstance, the residues N479 and T487, which have beenidentified to be essential for receptor binding [39, 41, 56].The residue N479 in SARS-CoV is located near K31 ofhACE2 and makes a salt bridge with E35, a residueburied in that hydrophobic environment. The residueT487 is located close to K353 on hACE2, and in turnmakes a salt-bridge with D38, also buried in that pocket(Fig 2a). Other important residues for this attachmentare Y442, L472, and D480 [40].Figure 1b displays the interface of SARS-CoV-2 RBDspike-protein complexed with hACE2, also in blue color.Now, the sequence YGFY observed in SARS-CoV is re-placed by YGFQ (Fig 3). The single-point mutationY484 → Q498 replaces a hydrophobic residue in SARS-CoV by a hydrophilic one in SARS-CoV-2.Figure 3 compares amino acid sequences of humanSARS-CoV and SARS-CoV-2 strains aligned with RBMof SARS-CoV Tor2, an epidemic strain isolated from hu-mans during the SARS epidemic in 2002-2003. The hu-man Tor2 strain has high affinity for hACE2 [38]. Wehighlight in Fig 3 in medium purple color the hydropho-bic sequence YGFY typical of SARS-CoV, occurring atpositions 481-484 in the spike protein. The correspond-ing mutated sequence occurs now at positions 495-498 in
Figure 1.
Detailed surface view of SARS-CoV and SARS-CoV-2. (a) Residues YGFY at the interface of SARS-CoVcomplexed with human ACE2 (PDB ID: 2AJF). These residues are in yellow color and form a hydrophobic pocket located inthe RBM (magenta color). (b) Residues YGF and EGF (yellow color) at the interface of SARS-CoV-2 complexed with humanACE2 (PDB ID: 6LZG). The first sequence is located in a hydrophobic pocket, while the second sequence EGF is on the RBMsurface (magenta color). Ribbon representation of ACE2 is in blue color.Figure 2.
SARS-CoV and SARS-CoV-2/ACE2 interfaces.
Ribbon diagrams of SARS-CoV (a) and SARS-CoV-2 (b)complexed with human ACE2 (blue color), where the RBM is highlighted in magenta color. The main residues responsible forthe structural binding are displayed in the stick representation.
SARS-CoV-2 spike protein.The important residues for the interface in-teraction found in SARS-CoV are mutated inSARS-CoV-2. The sequence alignments show themapping, Y442 → L455, L472 → F486, N479 → Q493,D480 → S494, and T487 → N501. These mutations donot present a drastic change in their hydrophobic char-acter [57], thus preserving the overall receptor-binding topological structure for these viruses. In particular,residues L455 and Q493 in SARS-CoV-2 preserve thenoted favourable interactions with the residues E35 andK31 in hACE2 [58] (Fig 2b). Interestingly, a new GFsequence appears in the RBM of SARS-CoV-2 strains asa consequence of the mutation L472 → F486, producing asmall hydrophobic surface, but does not seem to disruptthe proposed topological formation mechanism for ACE2
Figure 3.
Sequence alignments of human CoVs restricted to RBM residues.
The medium purple color highlightsthe YGFY pattern followed by the mutation Y498Q in the RBM of SARS-CoV-2 strains.Figure 4.
Sequence alignment of bat CoVs restricted to RBM residues of SARS-CoV Tor2.
The residues of YGFYpattern are in medium purple color. Last three alignments are placed together for direct amino acid sequence comparison. binding. No other GF sequence appears in their RBD.Details of protein-protein binding interfaces can bequite different among strains, likely related to their infec-tivity degree. It has been noted that mutations in RBMresidue T487 have an important role in the human-to-human and animal-to-human transmission of SARS-CoV[36, 38, 56].
Spike receptor-binding motifs in bats
It is known that not all SARS-CoV strains isolatedfrom bat hosts have exploited ACE2 as a cellular at- tachment. Therefore, the set of amino acid sequencesdisplayed in Fig 4 may exemplify the successful rela-tion between virus evolution and the binding mechanism.This set highlights in medium purple color the preservedamino acid residues in the sequence YGFY, characteris-tics of human SARS-CoV. For comparison, we also dis-play CoV strains with mutations in that SARS-CoV pat-tern to explore the relation between the hypothesizedmechanism and the cell receptor recognition.It has been demonstrated that LYRa11 [28], Rs3367[59], Rs4874 [60], WIV1, and WIV16 [28, 61], have thecapacity to use ACE2 for cell entry, in line with our hy-pothesis. Also, the near single-point mutation Y → F in
Figure 5.
Sequence alignment of palm civet CoVs and pangolin PCoVs restricted to RBM residues of SARS-CoV Tor2.
The residues of YGFY pattern are in medium purple color. Last line includes the SARS-CoV-2 sequence forcomparison. the next six strains does not interfere, as expected, in theattachment mechanism. This remark is supported by cellentry studies for Rs7327 [28, 60], Rs9401, RsSHC014,Rs4084, and Rs4231 [60], because they are in a groupthat is likely to use the ACE2 receptor. Moreover, thismutation replaces a hydrophobic residue by another onewith higher hydrophobicity, reinforcing the conforma-tional topology for binding with the receptor.The next group corresponds to the mutation Y → T,decreasing the initial hydrophobicity of the expectedpocket. It seems unlikely that this mutation and aminoacid residue deletions in the RBM associated to Tor2sequence affect the YGF-based attachment mechanismfor BtKY72 and BB9904/BGR. However, there is noavailable experimental data concerning their receptors.It is important to remark that the residue F492 inBM48-31/BGR produces another hydrophobic sequenceIGF at residues 490-492 (Fig 4). We speculate that thisdouble occurrence may disrupt the aforementioned mech-anism because of indications that BM48-31/BGR doesnot interact, at least with human ACE2 [28]. No otherGF sequence occurs for these strains in the RBM nor intheir RBD.Next CoV strains do not contain such GF sequencesof residues in the RBM neither in their RBD, exceptRf1/2004, which is located in RBD and with GF sur-rounded by hydrophilic residues. Although we have con-sidered only part of their sequences that better alignwith RBM of Tor2, it has been demonstrated that thespikes of HuB2013, HKU3, CoVZC45, CoVZXC21, Rf1,Rf4092, and Shaanxi2011 do not use hACE2, a resultthat is not just a consequence of deletions at the RBD[28]. Further support has been presented against HKU3in using hACE2 [62]. It seems unlikely that Rm1/2004infects hACE2 because its unfavourable binding free en-ergy [63]. Another result concludes that Rp3 is unable ofinfect hACE2 or even bat ACE2 [64]. We have placed together the alignments involvingTor2, RaTG13, and SARS-CoV-2 at the end of Fig 4for further comparison. The whole genome of RaTG13shares 96% amino acid sequence identity with SARS-CoV-2, and it is considered the most closely relatedgenome to this CoV [65]. Considering its spike pro-tein, and RBM, RaTG13 shares respectively 97% and76% amino acid identity with SARS-CoV-2. For com-parison, RaTG13 shares 79%, 77%, and 53% identity,respectively, for the whole genome, spike protein, andRBM with SARS-CoV Tor2. Therefore, SARS-CoV-2 ismostly similar to RaTG13 than SARS-CoV strains in allregions.
Spike receptor-binding motifs in palm civets andpangolins
To explore further the role of YGF-based attachmentmechanism, we exhibit comparative residue sequences forcivet and pangolins, again aligned with RBM of SARS-CoV Tor2 (Fig 5). This figure shows that the patternYGFY characteristic of human SARS-CoV is maintainedfor the collected data, but with a single-point mutationY → H for pangolin hosts PCoV. No other GF sequenceoccurs even in the RBD of these strains.We have included SARS-CoV-2 on the last line ofFig 5 for a direct comparison. PCoV GX-P2V shares79%, 77%, and 50% amino acid identity with Tor2, re-spectively for whole genome, spike protein, and RBMaligned with Tor2. In relation to SARS-CoV-2, PCoVGX-P2V shares 85%, 92%, and 75% amino acid identity,respectively for whole genome, spike protein, and RBM.It is believed that human SARS-CoV passed from palmcivets to humans in the 2002-2003 epidemic because theirgenome sequences are highly similar [38, 56, 66]. Theamino acid alignments show an almost identical RBMbetween human SARS-CoV, represented by Tor2 strain,and collected data from palm civet strains. This identi-fication also includes the YGF-based mechanism able touse ACE2 proteins. Nevertheless, these alignments dis-play high similarity between pangolins and SARS-CoV-2, which also support previous conclusions on pangolinsbeing the probable origin of SARS-CoV-2 [65, 67]. How-ever, based on our data related to host receptor bindingand their RBM and S-protein alignments, we can not dis-card bat RaTG13-like strain as also the possible originof SARS-CoV-2.
SARS-CoV and hCoV-NL63: only functionallyrelated
Although there is no many available experimental dataidentifying the viral receptor-binding protein for CoVs, itis well established that human SARS-CoV and hCoV-NL63 both employ ACE2 as the cell receptor to in-fect host cells [68, 69]. Interestingly, SARS-CoV andhCoV-NL63 domains do not present high sequence sim-ilarity. For example, their spike-S1 subunities shareonly 10% in similarity. Other features separate SARS-CoV and hCoV-NL63 [70]. SARS-CoVs are classified as β - coronavirus with subgenus sarbecovirus , while hCoV-NL63 is in genus α - coronavirus and subgenus setra-covirus . Although hCoV-NL63 also enters the cell via en-docytosis, its functional receptor requires heparan sulfateproteoglycans for the initial attachment, representing animportant extra factor for ACE2 act as a functional re-ceptor [33, 70]. Moreover, the spike-S1 glycoprotein ofSARS-CoV binds more efficiently ACE2 than the corre-sponding spike-S1 of NL63 (NL63-S) [71]. This may belinked to the fact that SARS-CoV and NL63-S contacthACE2 differently, a conclusion based upon the exper-imental results that NL63-S does not bind to hACE2through a single and large domain [69, 72]. Actually,different RBD regions have been identified within NL63-S. One of these regions was positioned at residues 476-616 and comprising three discontinuous RBM regions,RBM1 (residues 497-501), RBM2 (residues 530-540), andRBM3 (residues 575-594) [73–75]. A slightly differentRBD has been identified for this CoV [72]. It would belocated at residues 482-602, also with three discontin-uous RBM regions, which surround a shallow cavity athCoV-NL63-ACE2 binding interface. Curiously, its spikeprotein alignment with Tor2 does not show the expectedresidue pattern in the corresponding RBM of Tor2 norin the aforementioned RBD regions of NL63-S. This mayhelp to explain the unusual pathway of binding to ACE2 for this CoV. METHODS
All sequences were analysed with ClustalW andJalview. The list of GenBank accession codes for thespike proteins analysed in this work is available insupplementary Table S1.
CONCLUSION
We have analysed a number of CoV strains to supportthe hypothesis that SARS-CoV and SARS-CoV-2 strainsshare a common evolutionary mechanism for the initialattachment to ACE2. Moreover, we speculate that theYGF-based mechanism can act as a protein signature todistinguish CoVs able to use ACE2 as a cell entry re-ceptor whenever this residue sequence is located at theCoV RBM region. For example, SARSr-CoV ZXC21 andZC45, the closely related spike sequences to SARS-CoV-2, can be promptly put under suspicious in their ACE2binding affinity. Of course, as exemplified by hCoV-NL63, we can not discard that another mechanism canact helping such binding. It must be accentuated thatthe occurrence of other XGF sequences, mainly with Xbeing a hydrophobic residue, in the RBM, or even in theRBD region, is likely to disrupt the proposed topologicalmechanism for ACE2 binding. This because it mightintroduce hydrophobic loops promoting a new ligand-substrate recognition.
SUPPORTING INFORMATION
Table S1
GenBank accession numbers for thecoronavirus sequences used in this study.
Acknowledgements
P.P.D.C. and N.A.A. gratefully acknowledge financialsupports from the Brazilian agencies CAPES, andFAPESP, process 2015/16116-3, respectively.
Competing interests