Peptides of H. sapiens and P. falciparum that are predicted to bind strongly to HLA-A*24:02 and homologous to a SARS-CoV-2 peptide
11 Peptides of
H. sapiens and
P. falciparum that are predicted to bind strongly to HLA-A*24:02 and homologous to a SARS-CoV-2 peptide
Y Adiguzel* *Department of Biophysics, School of Medicine, Altinbas University, Kartaltepe Mah. Incirli Cad. No11, 34147 Bakirkoy, Istanbul, Turkey. E-mail: [email protected]
Abstract
Aim - This study is looking for a common pathogenicity between SARS-CoV-2 and plasmodium species, in individuals with certain HLA serotypes.
Methods - . Tblastx searches of SARS-CoV-2 are performed by limiting searches to plasmodium species that infect human. . Aligned sequences in the respective organisms' proteomes are searched with blastp. . Binding predictions of the identified SARS-CoV-2 peptide to MHC class I supertype representatives are performed. . Blastp searches of predicted-epitopes that bind strongly to the identified HLA allele are performed by limiting searches to human and to the plasmodium species. . Peptides with minimum 60 % identity to the predicted-epitopes are found in results. . Peptides among those, which bind strongly to the same HLA allele, are predicted. . Step- is repeated by limiting searches to human, for peptides sourced by limiting searches to plasmodium species at step- . . Step- and are performed with results of . Results - CFLGYFCTCYFGLFC peptide of SARS-CoV-2 has the highest identity to
P. vivax . Its GYFCTCYFGLF and YFCTCYFGLF parts are predicted to bind strongly to HLA-A*24:02. Results obtained only for peptides homologous to YFCTCYFGLF, as follows: YYCARRFGLF, YYCHCPFGVF, and YYCQQYFFLF are potential HLA-A*24:02 epitopes in the human proteome. Among FFYTFYFELF, YFVACLFILF, and YFPTITFHLF peptides in the plasmodium species' proteomes with strong binding affinity to HLA-A*24:02, only FFYTFYFELF of
P. falciparum is homologous to the potential HLA-A*24:02 epitope YFYLFSLELF in the human proteome.
Conclusion - Immune responses to the identified-peptides with similar sequences and strong binding affinities to HLA-A*24:02 may lead to autoimmune response risk in individuals with HLA-A*24:02 serotypes, upon getting infected with SARS-CoV-2 or
P. falciparum . Keywords:
SARS-CoV-2, human coronavirus, plasmodium, malaria, HLA class I, MHC class I, molecular mimicry, autoimmunity, disease susceptibility, HLA-A*24:02
1. Introduction
COVID-19 pandemic is an ongoing global crisis with devastating effects. Current advancements in the development of vaccine and medication are promising but are yet to be finalized. Chloroquine derivatives and hydroxychloroquine are among the treatment options [1]. However, they were not deemed as significantly effective in decreasing the death rates or the mechanical ventilation risk [2], in addition to the serious side effects [3]. Chloroquine and hydroxychloroquine are registered compounds for the treatment and prevention of malaria and for the treatment of autoimmune diseases [1]. Accordingly, this study was initiated by searching for a relation between SARS-CoV-2 and the plasmodium species that cause malaria in human [4,5], followed by investigating the potential of autoimmune reaction through molecular mimicry. In relation, Iesa and co-workers reported “SARS-CoV-2 and
Plasmodium falciparum common immunodominant regions may explain low COVID-19 incidence in the malaria-endemic belt” [6]. Their study was initiated mainly based on the observation of low number of confirmed cases of COVID-19 at the African region that is characterised by high malaria-prevalence ( references 4–6 of [6]). This was highlighted in an earlier study as well ( references 1,2 of [7]). Presence of zootonic viral genome in
P. vivax [8], which was pointed also by de Souza [9], was mentioned by Mehta and co-workers [7] as a basis of their investigation. There is also one reported case study [10] of co-infection with COVID-19 and
P. vivax . In that case study, the authors reported co-infection in a 10-year-old boy. The boy was told to have received incomplete primaquine therapy six months before, for the
P. vivax infection. The authors concluded that it was a relapse of the earlier infection, triggered by SARS-CoV-2. Yet, re-infection with
P. vivax simultaneously with COVID-19 or a relapse in its natural course were not ruled out. The study by Mehta and co-workers [7] was looking for a correlation in the incidences of malaria and SARS-CoV-2 and the data was indicative of a negative correlation. What is of more interest within the scope of this study is that the authors also looked for genomic similarity between various strains of SARS-CoV-2 and the matryoshka RNA virus 1 (MaRNAV-1), which is associated with
P. vivax [8]. They found a 15-base pair (bp) alignment with a 93.75 % identity (e-value 3.1) between SARS-
CoV-2 (NC_045512.2) 22487-22502 bp region and the 2796-2781 bp region of MaRNAV-1. The aligned region, which is encoding surface glycoprotein, was suggested to be small to draw any conclusion. Yet, the authors stated that a shared immunogenicity cannot be ruled out, and further, their observation was valid in all the different strains' data that they have used. Cross-species’ peptide sharing and high number of matches of amino acids (aa) are not unexpected. The ultimate claim of the current work is that such similarities carry the potential of leading to inherently related autoimmune pathologies. Immune response against the proteins of the pathogen is a defence mechanism of the host. However, that can cause adverse effects when the pathogen protein part of interest has similar sequences to that of human. That is molecular mimicry, which is among the conditions that lead to autoimmune response. Kerkar and Vergani [11] supported this notion by the findings of de novo autoimmune hepatitis associated with certain viral infections. The concept is not new though, e.g., it was found several decades earlier that 10mer portion in the V3-loop of envelope glycoprotein gp120 from HIV-1 isolates overlap 70 % with the collagen-like region of human complement component C1q-A [12]. That collagen like region reacts with autoantibodies from several autoimmune disorders, which stimulated the authors to look for its reactivity in the sera of AIDS patients. A synthetic peptide containing that 10mer was found to react. There were additional proofs of cross-reaction, and common features of the V3-loop of HIV-1 and immunoglobulin heavy chain variable region was mentioned as further support of the immune network manipulation of HIV and the autoimmunity induced by HIV. Continuing work pointed at the related problems associated with HIV-1 gp120-based vaccines [13]. They later reported reactive antibodies against HIV-1 V3-loop peptide in healthy individuals [14], and antibodies against HIV-1 V3-loop were found to be complementary to the human IgG [15]. Such similarities and the pathway analysis of similar peptides implied viral infection related immune response pathologies [16]. Kanduc and Shoenfeld performed related studies [17–19]. They [17] considered risks that are inherent in associated vaccines, and accordingly investigated vaccines for papilloma virus L1 (HPV L1) proteins of 4 different strains, and hepatitis B virus surface antigen. Authors found similarities with the "epitopes that had been validated as immunopositive in humans". E.g., 18 of the 60 heptapeptides of HPV L1 proteins were found to be present in 25 epitopes that are experimentally validated as immunopositive in humans, wherein cross-reactions of immune responses against HPV L1 proteins with 20 human proteins were finally deemed feasible, revealing the probable connection with the related pathophysiological conditions, and implying that the peptide-based vaccines need to be exclusively pathogen-specific. With respect to SARS-CoV-2, they found homologous sequences between the spike glycoprotein of the virus and the human surfactant and related proteins [20]. Pathophysiology of the lungs and the airways in the diseased individuals was suggested to be related to such similarities. Besides, a risk of the vaccines that would be based on whole SARS-CoV-2 antigens would less likely be revealed through tests of vaccines on non-human primates because, e.g., heptapeptide sharing between them and the pathogens, including SARS-CoV-2, was not found to be as high as that with human, meaning that autoimmune reactions may not be encountered at preclinical testing on non-human primates [21]. Further investigation led the researchers conclude that "aged mice" could be suitable for testing SARS-CoV-2 spike glycoproteins-based vaccines [22]. Relation of COVID-19 with immunity is multifaceted [23–28]. Woodruff and co-workers [29] stated that COVID-19 patients that experience serious health threat "displayed hallmarks of extrafollicular B cell activation as previously described in autoimmune settings." Autoinflammatory and autoimmune conditions in COVID-19 are also reviewed, wherein it was mentioned that COVID-19 and autoimmunity could be linked through mechanisms like molecular mimicry and bystander activation [30]. SARS-CoV-2 was also hypothesized to be triggering stress-induced autoimmunity through molecular mimicry [31]. Moreover, peptide sharing between SARS-CoV-2 proteome and human brainstem respiratory pacemaker proteins [32], and human heat shock proteins 90 and 60 [33] were reported. Another study revealed peptide sharing between SARS-CoV-2 proteins and Odorant Receptor 7D4, Poly ADP-Ribose Polymerase Family Member 9, and Solute Carrier Family 12 Member 6 proteins of human [34]. In addition, strong immune cross-reactions with SARS-CoV-2 spike protein antibody [35] may infer autoimmunity risk in susceptible individuals, through molecular mimicry [30]. Further, Lucchese [36] wrote that studies on the cerebrospinal fluids of COVID-19 patients are indicative of autoimmunity, and Lyons-Weiler [37] reported that more than one third of the immunogenic peptides of SARS-CoV-2 have homology with the proteins that are of importance for the adaptive immune system. Kanduc [38] also pointed at wide range of disorders associated with possible autoimmunity against human peptides homologous to the immunogenic SARS-CoV-2 peptides, and suggested examination of patients’ sera for autoantibodies against those human peptides. In 2003 [39], human leukocyte antigen (HLA)-B*46:01 was found to be leading to susceptibility to an associated disease, SARS, with a demonstrated significance after correction for multiple testing, based on 6 severe cases among 33, with respect to 101 controls [40]. In 2004 [41], HLA-B*07:03 was also found to be leading to susceptibility to SARS, with a demonstrated significance after correction for multiple testing, based on 83 serologically confirmed cases, with respect to 18774 unrelated serologically typed bone marrow donors as controls [40]. The work by Nguyen and co-workers [42] involving SARS-CoV-2 reported that conserved proteome regions with high major histocompatibility complex (MHC) allele binding affinities were under no positive or negative selective pressure. Yet, the related autoimmune reaction risk as those suggested by the several studies (e.g., [20–22,30–38,43–45]) is the focus of current work. In relation, patients expressing HLA-DRB1*04:01 are suffering from the severe form of rheumatoid arthritis more often, and in relation, HLA-DRB1*04:01 has a 5mer, which is shared with the
E. coli heat shock protein [46]. This can bear a potential risk for the rheumatoid arthritis patients with HLA-DRB1*04:01, when they are exposed to Enterobacteriaceae. Further examples of pathogen and human proteins’ peptide similarities are present in systemic lupus erythematosus [47], systemic sclerosis [48], and primary biliary cholangitis [49]. In plasmodium infections, pathogen-infected cell-detection by the cytotoxic CD8(+) T cells of the adaptive immune system was thought to be protective through MHC class I molecules in case of the liver-stage malaria but not in case of the blood stage malaria, because erythrocytes do not express MHC class I molecules [50]. This concept changes by the finding that
P. vivax infected reticulocytes keep expressing MHC class I molecules [51]. Regarding molecular mimicry, Mourão and co-workers [52] mentioned some earlier work [53–55], wherein molecular mimicry between histamine-releasing factor of human and translationally controlled tumour protein of P. falciparum [56], vitronectin and erythrocyte membrane protein 1 of P. falciparum [54] and ankyrin, spectrin, and actin proteins of red blood cells and
P. vivax proteins [55] are present.
2. Methods
Tblastx [57] at NCBI [58] is used to compare SARS-CoV-2 (NC_045512.2) and five plasmodium species that infect human, as the organisms, to which the separate searches are limited. The default algorithm parameters are used, except that the number of maximum target sequences are increased to 20000. Please note that the maximum target sequence number that is allowed is diminished later to 5000, by the server. The plasmodium species that are used are
P. falciparum (taxid:5833),
P. malariae (taxid:5858),
P. vivax (taxid:5855),
P. ovale (taxid:36330), and
P. knowlesi (taxid:5850). The aligned sequences with the highest identity in the tblastx outputs are used as inputs of the blastp at NCBI. This is performed to check the presence of peptides with the same sequences as those in the alignments with the highest identities, within the proteomes of the respective organisms. Accordingly, blastp searches are limited to the organisms, the genomes of which led to the respective sequences in the alignments with the highest identities at the tblastx searches. So, either COVID-19 (taxid:2697049) or the respective plasmodium species is used to limit the blastp search. The searches are initiated with the default parameters, except for increasing the number of maximum target sequences, as mentioned above. The algorithm adjusts the parameters for short input sequences.
Peptide with the CFLGYFCTCYFGLFC sequence is present in the non-structural protein 6 (nsp6) that is cleaved from the replicase polyprotein 1a of SARS-CoV-2 proteome. It is among the sequences that are obtained through tblastx, and analyzed afterwards with blastp, as described. Its binding affinities to the MHC class I (
MHC class I genes are HLA-A, -B, and -C genes [42]) proteins are predicted by NetMHC 4.0 [59,60], NetMHCpan 4.1 [61], PickPocket 1.1 [62], and NetMHCcons 1.1 [63]. This is performed by using 12 MHC supertype representatives (HLA-A*01:01 [A1], HLA-A*02:01 [A2], HLA-A*03:01 [A3], HLA-A*24:02 [A24], HLA-A*26:01 [A26], HLA-B*07:02 [B7], HLA-B*08:01 [B8], HLA-B*27:05 [B27], HLA-B*39:01 [B39], HLA-B*40:01 [B44], HLA-B*58:01 [B58], HLA-B*15:01 [B62]). Default parameters are used during these predictions, wherein the threshold for strong binders percent rank is 0.5 and that of the weak binders is 2, in case of NetMHC. Additionally, all predictions are printed in case of NetMHCpan. Percent rank is the percentile of the predicted binding affinity, which is compared to the distribution of affinities that are calculated on 400.000 random natural peptides, as explained at the respective website (https://services.healthtech.dtu.dk/service.php?NetMHC-4.0). Only strong binders (SBs) are considered in this analysis. In case of predictions with the NetMHCpan, BA option is also used for receiving information on the binding affinities. In case of PickPocket, the difference of the logarithm (with base 50000) of the affinity in nanomolar from 1, which is expressed as 1-log50k(aff), is evaluated manually, by considering those ≥ 0.5 as SBs. Predictions by the NetMHC and NetMHCpan are performed for 8-14mers and predictions by the PickPocket are performed for 8-12mers, by taking predictions of peptides longer than 11 aa into consideration with caution, as indicated at the website of NetMHC, wherever applicable. NetMHCcons integrates the NetMHC, PickPocket, and NetMHCpan, and the default parameters are used in its predictions, by selecting 8-14mers. Moreover, any possible difference in the outcome by using a peptide sequence as input is checked by using nsp6 protein sequence with the investigated sequence as the input. The amino acid sequence of nsp6 protein in single letter code is shown below: >sp|P0DTC1|3570-3859 SAVKRTIKGTHHWLLLTILTSLLVLVQSTQWSLFFFLYENAFLPFAMGIIAMSAFAMMFVKHKHAFLCLFLLPSLATVAYFNMVYMPASWVMRIMTWLDMVDTSLSGFKLKDCVMYASAVVLLILMTARTVYDDGARRVWTLMNVLTLVYKVYYGNALDQAISMWALIISVTSNYSGVVTTVMFLARGIVFMCVEYCPIFFITGNTLQCIMLVY
CFLGYFCTCYFGLFC
LLNRYFRLTLGVYDYLVSTQEFRYMNSQGLLPPKNSIDAFKLNIKLLGVGGKPCIKVATVQ
NetCTLpan 1.1 [64] is used to predict cytotoxic T lymphocyte (CTL) epitopes (8-11mers). It is an updated version of the original former server, NetCTL 1.2 [65]. NetCTLpan includes predictions of the proteosomal cleavage of the C-term, combined with the TAP transport efficiency and the MHC class I binding. It can use the 12 MHC supertypes, all of which are utilized during predictions. It presents a unified prediction score (COMB) by integrating the results of separate prediction methods. The weights of C-term cleavage and TAP transport efficiency can be adjusted, along with the threshold for epitope identification, wherein the default values are 0.225, 0.025, and 1.0, correspondingly. Default values are used here. However, a change in the outcome due to using a short sequence as input is checked again by using nsp6 protein sequence as input, to further adjust the C-term cleavage weight in the prediction for the CFLGYFCTCYFGLFC, in case that a significant difference is observed. Finally, peptides with the sequences that are identified as SBs are assumed in this study to be so only if the predictions by at least three of the tools among NetMHC, PickPocket, NetMHCpan, and NetMHCcons estimate such a binding affinity to the same HLA allele, in addition to the epitope prediction by the NetCTLpan.
Two parts of the input sequence (YFCTCYFGLF and GYFCTCYFGLF) are predicted in the results of the previous steps, to have high binding strength to HLA-A*24:02. Therefore, aligned sequences with YFCTCYFGLF and GYFCTCYFGLF are searched in the blastp results that are obtained as previously described. Its aim is to check the homologous sequences of the YFCTCYFGLF and GYFCTCYFGLF peptides in the human proteome. Accordingly, blastp search of the CFLGYFCTCYFGLFC is performed first, by limiting the results to
H. sapiens . Default blastp search algorithm parameters are used, except for the number of maximum target sequences, as previously mentioned. Search for the aligned sequences with the YFCTCYFGLF and GYFCTCYFGLF is performed in the blastp result, as follows: Sequences within the results of the blastp are deemed as similar in this study, if they align with the query sequence with more than 60 % match, and at the same time, if they contain any of the sequences of interest (YFCTCYFGLF or GYFCTCYFGLF) within the query, without any gap. Those subject sequences that are aligned with the query sequence CFLGYFCTCYFGLFC with gaps are also eliminated. So, the identified alignments came to be those that contain YFCTCYFGLF or GYFCTCYFGLF parts of the queries without any gaps while the complete sequence that is aligned is free of gaps. The analysis is done with the MS Excel program . After identification of the sequences with the given criteria, binding to the respective HLA serotype is predicted as described in section 2.2, but by choosing a specific peptide length, which is 10, and a specific HLA allele (HLA-A*24:02), based on the outcomes of the previous steps. Peptides with the sequences that are identified as SBs are assumed to be so only if the predictions by at least three of the tools among NetMHC, PickPocket, NetMHCpan, and NetMHCcons estimate strong binding (SB), in addition to the prediction as epitope (E), by the NetCTLpan. Homologous 10mer peptides to YFCTCYFGLF and GYFCTCYFGLF, with at least 60 % match to those query sequences are searched in the proteomes of the respective plasmodium species as well. In that case, searches are limited to the respective plasmodium species. Accordingly, blastp searches are performed as before, and the default blastp search algorithm parameters are set, except for adjusting a maximum 5000 target sequence return. 10mer homologous sequences of the plasmodium peptides within the results of that search are identified, as above. Following that, binding affinities of the resulting peptide sequences to the previously determined HLA-A*24:02 allele are predicted as described in section 2.3. Finally, blastp searches of the strong binders are preformed separately, the last time, by limiting the search to
H. sapiens (taxid:9606), followed by identifying the homologous sequence(s) within the results and predicting its/their HLA-A*24:02 binding affinity/affinities.
TCRpMHCmodels [66] is used to predict the three-dimensional structure of T cell receptor (TCR) alpha and beta chains that are complexed with the YFCTCYFGLF peptide and MHC class I. This modeler receives the TCR alpha and beta chains', peptide's, and the MHC's single letter amino acid sequences in fasta format, as input. It then provides the user with a three-dimensional model by choosing the best templates. The information of the sequences is received from Protein Data Bank [67]. TCR alpha and beta chains' sequence information is received from the data that is deposited with the title "H27-14 TCR specific for HLA-A24-Nef134-10" (PDBid:3VXQ), based on the publication by Shimizu and co-workers [68]. MHC's sequence information is retrieved from the data that is deposited with the title "Crystal structure of peptide-HLA-A24 bound to S19-2 V-delta/V-beta TCR" (PDBid:5XOV), based on the publication by Shimizu and co-workers [69]. Seq2Logo 2.0 [70] is a sequence logo generator. It is used here for the generation of sequence profiles of the 10mers that are both SBs to HLA-A*24:02 and have 60 % similar sequences with the YFCTCYFGLF. Accordingly, three logos are generated. The first one is generated from the YYCARRFGLF, YYCHCPFGVF, and YYCQQYFFLF sequences. The second one is generated from the FFYTFYFELF, YFVACLFILF, and YFPTITFHLF sequences. The last one is generated from the sequences of the first and the second one, namely from the 10mers that are both SBs to HLA-A*24:02 and have 60 % similar sequences with the YFCTCYFGLF, which are obtained through the earlier analysis that is described here. For logo generation, default parameters are used, other than that for the pseudo count correction (weight on prior). Pseudo count correction is for low counts, which means that "the amino acid frequencies displayed in the sequence logos are corrected for low number of observations using a Blosum amino acid similarity matrix," as explained at the website (https://services.healthtech.dtu.dk/service.php?Seq2Logo-2.0). The weight on prior option among the adjustable parameters could be set to zero to turn it off, as the number of observations is small. Instead, it is set to 1 here, to end up in a logo that simply looks more comparable in appearance to the sequence logo of the HLA-A*24:02 allele that is received from the MHCMotifViewer [71]. The website of the file for the 9mer motif of the HLA-A*24:02 allele in the jpeg format is: https://services.healthtech.dtu.dk/services/MHCMotifViewer/HLA-A_files/Media/HLA-A2429/HLA-A2429.jpg?disposition=download. Finally, the information that is available at the websites of the predictors is referred to the reader for further details. The process-flow of the study is summarized in Fig. 1.
Fig. 1.
Outline of the process that is followed as a flow-chart.
3. Results
In tblastx [57], query and subject nucleotide sequences are translated in six reading frames, followed by protein blastp comparisons [72]. Tblastx retrieved no results for
P. falciparum . Whereas, in case of
P. malariae , there is 1 alignment with the highest (70 %) identity ( s1 , link for all the supplementary (s) files is shorturl.at/cmKP6 ); in case of P. vivax , there are 12 alignments with the highest (73 %) identity ( s2 ); in case of P. ovale , there is 1 alignment with the highest (75 %) identity ( s3 ); and in case of P. knowlesi , there are 3 alignments with the highest (67 %) identity ( s4 ). The lengths of the aligned sequences are 10, 15, 8, and 12 aa, respectively. The sequences of those alignments are displayed in Table 1. It is observed in Table 1 that, for instance, P. malariae sequence CHYFRCHYFR is aligned with the CNFFNCHYFR with maximum identity. Rest of the data in Table 1 should be interpreted in a similar fashion. It is also observed in Table 1 that there are no results for
P. falciparum , as mentioned. As a different outcome, it can be seen in Table 1 is that
P. knowlesi sequences VVLWLCCLCWLC and VVLWLCCCCWLC both aligned with the LVLWLLCNCFLC with the same maximum identities. Intriguingly, there is palindromy in these aligned query and subject sequences, yet it is beyond the scope of the current work.
Table 1.
Aligned SARS-CoV-2 query sequences and the plasmodium subject sequences with the highest identities. Query (COVID-19) sequence
Subject (plasmodium) sequence
Subject name - - P. falciparum seq.1 CNFFNCHYFR seq.1' CHYFRCHYFR
P. malariae seq.2 CFLGYFCTCYFGLFC seq.2' CFVGYFCTCFVGYFC
P. vivax seq.3 FCFHKCFC seq.3' FCFCSCFC
P. ovale seq.4 LVLWLLCNCFLC seq.4' VVLWLCCLCWLC
P. knowlesi seq.4'' VVLWLCCCCWLC
There are repetitive regions in some of these short sequences that are displayed in Table 1. Detection of repetitive regions in the comparison of distant genomes is expected. However, low complexity regions were filtered during tblastx search with the default algorithm parameters. The sequences in Table 1 are used separately as inputs of the forthcoming blastp searches. The blastp searches are limited to the corresponding source-organisms. Accordingly, only the CFLGYFCTCYFGLFC sequence is found to be present in the translation products of the respective organism, which is COVID-19 ( s6 of s5–13 ). This is likely because tblastx compares the query and the subject after translating the nucleotide sequences of each of them in six possible different reading frames. Yet, this observation might be due to some annotation amendment requirements as well [73]. The CFLGYFCTCYFGLFC sequence is not just the query sequence with the highest identity to
P. vivax . That sequence is present in the alignment results of all the respective tblastx searches for SARS-CoV-2 ( s14–17 ). Furthermore, it is the most frequently observed query sequence in the alignments (Fig. 2, Table 2, s18 ). When
P. malariae is selected as the organism in tblastx search of SARS-CoV-2, other alignment results than that with the highest identity reveal the presence of the whole or part of the CFLGYFCTCYFGLFC sequence in many aligned queries ( s14, s18 ). YFCT repeat is abundant in the alignments among the alignment hits with
P. malariae genome assembly chromosome 10 (sequence ID LT594498.1) ( s14 ). So, the subject sequences in the alignments are comprised of repeats, low complexity regions. It was mentioned above that there are 12 alignments with the highest (73 %) identity to the CFLGYFCTCYFGLFC sequence within the results of the tblastx search of SARS-CoV-2, which is limited to
P. vivax ( s15 ). That tblastx search result has indeed many other alignments with the same sequence as part, or the whole, of the aligned query (73 of a total number of 143, Table 2) ( s15, s18 ). In case of P. ovale and
P. knowlesi , the situation is rather like that of
P. malariae ( s16, s17 ). However, the number of alignment hits with the whole, or part, of the CFLGYFCTCYFGLFC sequence are less (Table 2). Fig. 2.
Alignment number of the query sequences. Number of alignments that contain the whole, or part, of the respective sequence. Table 2.
Alignment frequencies of the query sequences with the highest percent identities. • P. malariae
P. vivax
P. ovale
P. knowlesi
CNFFNCHYFR
CFLGYFCTCYFGLFC
252 / 693
73 / 143
27 / 145
17 / 124
FCFHKCFC
LVLWLLCNCFLC • Alignment frequencies of the query sequences with the highest percent identities in the respective results of tblastx searches of SARS-CoV-2, limited to certain plasmodium species. Alignment frequency is presented as a ratio, wherein the nominator is the number of alignments that contain the whole, or part, of the sequence of interest, and denominator is the total number of alignments.
Blastp of the CFLGYFCTCYFGLFC sequence reveals that this 15mer is part of the non-structural protein 6 (nsp6), which belongs to the replicase polyprotein 1a (ORF1a polyprotein) ( s6 of s5–13 ). Nsp6 is cleaved from the ORF1a polyprotein. It is indicated at the P0DTC1 entry of the UniProtKB [74] that subcellular location of nsp6 is host membrane and it is a multi-pass membrane protein, as inferred from sequence similarity to the ORF1ab (UniProtKB:P0C6X7). The position including the CFLGYFCTCYFGLFC sequence at 3784-3798 of the ORF1a polyprotein, which is corresponding to the 215-229 region of the nsp6, is transmembrane and has helical topology. NetMHC 4.0, PickPocket 1.1, and NetMHCpan 4.1 tools are used to predict the binding of CFLGYFCTCYFGLFC to the MHC class I proteins. NetMHC and NetMHCpan both use neural networks [59–61,75] while PickPocket [62] uses "position specific weight matrices" ( https://services.healthtech.dtu.dk/service.php?PickPocket-1.1 ). It is indicated in the publication [75] for an earlier version of NetMHC (v3.0) that the method was used to predict MHC-binding peptides in viral proteomes including influenza, SARS, and HIV, with an average of 75 % to 80 % confirmed binders of MHC. The CFLGYFCTCYFGLFC sequence has strong binding affinity for certain HLA alleles, as predicted by the NetMHC and PickPocket tools (Table 3, s19 , s20 ). There are 7 strong binders (SBs) in case of NetMHC and 4 SBs in case of PickPocket. In case of NetMHC, one 10mer is SB to HLA-B*15:01, and the rest are all SBs to HLA-A*24:02. In case of PickPocket, one 9mer is SB to both HLA-A*02:01 and HLA-A*24:02, one 12mer is predicted as SB to HLA-A*02:01, and the remaining two are SBs to HLA-A*24:02. There is no SB in case of the predictions by the NetMHCpan tool ( s21 ). In case of NetMHCcons (Table 3, s22 ), there are 5 SBs predictions, wherein one 12mer is SB to HLA-A*02:01, and the rest are all SBs to HLA-A*24:02. Finally, the 10mer YFCTCYFGLF and 11mer GYFCTCYFGLF are predicted as SBs to HLA-A*24:02, by NetMHC, PickPocket, and NetMHCcons. Table 3.
MHC class I binding prediction outcomes. • NetMHC
PickPocket
NetMHCcons
NetCTLpan
MHC class I type
Length (SB) ( ≥ (SB) (E) HLA-A*24:02
GYFCTCYF
HLA-A*01:01
FLGYFCTCY
HLA-A*02:01 HLA-A*24:02
YFCTCYFGL
HLA-B*15:01
FLGYFCTCYF
HLA-A*24:02
GYFCTCYFGL
HLA-A * YFCTCYFGLF
YFCTCYFGLF
YFCTCYFGLF
YFCTCYFGLF
HLA-A*24:02
CFLGYFCTCYF
CFLGYFCTCYF
HLA-A * GYFCTCYFGLF
GYFCTCYFGLF
GYFCTCYFGLF
GYFCTCYFGLF
HLA-A*02:01
FLGYFCTCYFGL
FLGYFCTCYFGL
HLA-A*24:02
LGYFCTCYFGLF
HLA-A*24:02
GYFCTCYFGLFC
HLA-A*24:02
FLGYFCTCYFGLF
HLA-A*24:02
CFLGYFCTCYFGLF
CFLGYFCTCYFGLF • MHC class I binding prediction results for the sequences with different lengths, which are derived from the 15mer peptide
CFLGYFCTCYFGLFC.
NetMHC, PickPocket, NetMHCpan, NetMHCcons, and NetCTLpan are used for predictions. NetMHCpan results are not displayed, as no strong binding peptide is predicted there. (SB: strong binder; ≥ ≥ Differences in the prediction results for the 15mer CFLGYFCTCYFGLFC sequence are minor when the results of the 15mer as the input is compared to that when the nsp6 protein with that 15mer is the input ( s23–26 ). The 10mer YFCTCYFGLF and 11mer GYFCTCYFGLF, which are used for further analysis, are also predicted as SBs to HLA-A*24:02, when the prediction is performed with the nsp6 protein. As before, prediction for the target sequence is not leading to any SB in case of the NetMHCpan prediction when it is predicted as part of nsp6. NetCTLpan 1.1 is used to predict cytotoxic T lymphocyte (CTL) epitopes. All the epitopes except for one are predicted to be binding to the HLA-A*24:02 allele (Table 3, s27 ). One 9mer is predicted to be binding to the HLA-A*01:01 allele. The 10mer YFCTCYFGLF and 11mer GYFCTCYFGLF are predicted as epitopes of HLA-A*24:02, in accordance with the other predictions. It is also checked if those two epitopes are still predicted when the 15mer target sequence is part of the nsp6 protein ( s28 , note that the sequence is at the 214-228 position of the file ). C-term cleavage values are varying, as expected. However, the 10mer YFCTCYFGLF and 11mer GYFCTCYFGLF are predicted again as epitopes, which are both binding to the HLA-A*24:02 allele. Three-dimensional structure model of the YFCTCYFGLF peptide in complex with TCR alpha and beta chains, and MHC, is shown in Fig. 3.7
NetMHC, PickPocket, NetMHCpan, NetMHCcons, and NetCTLpan are used for predictions. NetMHCpan results are not displayed, as no strong binding peptide is predicted there. (SB: strong binder; ≥ ≥ Differences in the prediction results for the 15mer CFLGYFCTCYFGLFC sequence are minor when the results of the 15mer as the input is compared to that when the nsp6 protein with that 15mer is the input ( s23–26 ). The 10mer YFCTCYFGLF and 11mer GYFCTCYFGLF, which are used for further analysis, are also predicted as SBs to HLA-A*24:02, when the prediction is performed with the nsp6 protein. As before, prediction for the target sequence is not leading to any SB in case of the NetMHCpan prediction when it is predicted as part of nsp6. NetCTLpan 1.1 is used to predict cytotoxic T lymphocyte (CTL) epitopes. All the epitopes except for one are predicted to be binding to the HLA-A*24:02 allele (Table 3, s27 ). One 9mer is predicted to be binding to the HLA-A*01:01 allele. The 10mer YFCTCYFGLF and 11mer GYFCTCYFGLF are predicted as epitopes of HLA-A*24:02, in accordance with the other predictions. It is also checked if those two epitopes are still predicted when the 15mer target sequence is part of the nsp6 protein ( s28 , note that the sequence is at the 214-228 position of the file ). C-term cleavage values are varying, as expected. However, the 10mer YFCTCYFGLF and 11mer GYFCTCYFGLF are predicted again as epitopes, which are both binding to the HLA-A*24:02 allele. Three-dimensional structure model of the YFCTCYFGLF peptide in complex with TCR alpha and beta chains, and MHC, is shown in Fig. 3.7 Fig. 3.
Three-dimensional structure model of TCR alpha, TCR beta, YFCTCYFGLF, and MHC complex, from different perspectives. Model is obtained through TCRpMHCmodels (55).
Blastp of CFLGYFCTCYFGLFC by limiting the search to
H. sapiens is performed and the top results in the description table belong to the immunoglobulin heavy chain junction region ( s29 ). Searching for the gapless alignments of YFCTCYFGLF and GYFCTCYFGLF sequences within the results revealed no results for the latter sequence. In case of the former sequence, 5 sequences (YFCARNFGPF, YFCASSFGSF, YYCARRFGLF, YYCHCPFGVF, and YYCQQYFFLF) are identified, wherein the last one (YYCQQYFFLF) is not present when the blastp search is set to return maximum 5000 target sequences instead of 20000. After identification of the sequences, their binding to the HLA-A*24:02 allele is predicted. Among those, YYCARRFGLF, YYCHCPFGVF, and YYCQQYFFLF sequences are predicted to bind to the HLA-A*24:02 allele, by all the prediction tools (Table 4, s30–34 ). Upon being infected with the SARS-CoV-2, individuals with the HLA-A*24:02 allele may develop autoimmune responses.
Table 4.
HLA-A * • Species
NetMHC (SB / WB / none)
PickPocket ( ≥ NetMHCpan (SB / WB / none)
NetMHCcons (SB / WB / none)
NetCTLpan (E)
H. sapiens
YFCARNFGPF SB <0.5 (0.436) WB SB E YFCASSFGSF SB <0.5 (0.456) WB SB E YYCARRFGLF SB ≥ SB SB E YYCHCPFGVF SB ≥ SB SB E YYCQQYFFLF SB ≥ SB SB E • HLA-A * ≥ ≥ The sequences that are displayed in Table 4 belong to the following proteins in the human proteome: − YFCARNFGPF belongs to the Ig heavy chain variable region (Sequence ID (seq.ID) AAQ05358.1). − YFCASSFGSF belongs to the T cell receptor beta chain variable region (seq.ID ANO56516.1). − YYCARRFGLF belongs to the immunoglobulin heavy chain variable region (seq.ID QGT38216.1). − YYCHCPFGVF belongs to the protocadherin FAT4 (seq.ID AHN13824.1), unnamed protein product (seq.IDs BAF84150.1, BAG53346.1), protocadherin Fat 4 isoform X2 (seq.ID XP_011530539.1), Fat tumor suppressor homolog 1 (seq.ID EAX05203.1), and protocadherin Fat 4 isoforms 3, 2, and 1 (seq.IDs NP_078858.4 and Q6V0I7.2, NP_001278214.1, and NP_001278232.1, respectively). − YYCQQYFFLF belongs to the anti-HIV immunoglobulin kappa chain variable region (seq.IDs AVQ94610.1 and AVQ94611.1). Blastp search for the 15mer CFLGYFCTCYFGLFC is performed also by limiting the search to the individual plasmodium species of interest (
P. falciparum , P. malariae , P. vivax , P. ovale , and
P. knowlesi ) ( s35–39 ). There are 12 similars to the YFCTCYFGLF sequence, according to the pre-set criteria (see section 2.3). Among those, 3 sequences are predicted as SBs to the HLA-A*24:02 allele (Table 5, until the last row, s40–44 ). Those 3 sequences may indicate possible immune response, which would be developed in the HLA-A*24:02 serotypes, due to binding of the FFYTFYFELF, YFVACLFILF, and YFPTITFHLF peptides of the pathogens, upon infection with the
P. falciparum , P. malariae , and
P. ovale , respectively. Yet, the question is whether it can lead to an autoimmune response as well. Therefore, we looked if any of these peptides have similarity to human proteins while binding to the HLA-A*24:02 allele with high affinity. Accordingly, blastp searches of those sequences are performed separately, by limiting the searches to
H. sapiens ( s45–47 ). Eventually, one of those sequences (YFYLFSLELF) is identified through the analysis of the results of the blastp search with the FFYTFYFELF sequence. YFYLFSLELF sequence belongs to the LOC441426 protein in the human proteome (seq.IDs AAI36787.1 and AAI36791.1). It is also binding strongly to the HLA-A*24:02 allele (Table 5, the last row, and s48–52 ). Table 5.
HLA-A * • NetMHC
PickPocket
NetMHCpan
NetMHCcons
NetCTLpan
Species (SB / WB / none) ( ≥ (SB / WB / none) (SB / WB / none) (E / none) P. falciparum
FFCTPFFILF WB <0.5 (0.446) WB WB E YICIFYFILF WB <0.5 (0.345) none none none YIITCLSGLF WB <0.5 (0.288) none none none * FFYTFYFELF SB ≥ WB SB E YICTGMFSLF WB <0.5 (0.316) none WB none P. malariae
FFCTKYFAHF WB <0.5 (0.442) WB WB E YFVACLFILF SB ≥ WB SB E P. vivax
YFATFYFTLY SB <0.5 (0.334) none WB E P. ovale
YFPTITFHLF SB ≥ SB SB E YFGTFYFMLY WB <0.5 (0.347) none none E P. knowlesi
FFCACLFLLF SB <0.5 (0.472) none SB E YFATFYFTLY SB <0.5 (0.334) none WB E * H. sapiens YFYLFSLELF SB <0.5 (0.455) SB SB E • HLA-A * ≥ ≥ Homologous sequences to the 11mer GYFCTCYFGLF are also searched in the blastp results of the CFLGYFCTCYFGLFC searches in the plasmodium species that are known to infect human. The HYFPTITFHLF sequence in the
P. ovale proteome is identified, according to the criteria described in section 2.3, and it is also predicted as SB to the HLA-A*24:02 allele ( s53 ). Analysis of the blastp search results of that sequence by limiting the search to
H. sapiens ( s54 ) did not reveal similar sequences, based on the criteria explained in section 2.3. Fig. 4 displays the sequence profiles of human- and plasmodium-sourced 10mers that are both SBs to HLA-A*24:02 and have 6 identical residues with the YFCTCYFGLF, in comparison to the 9mer motif of the HLA-A*24:02 allele. The tree-dimensional structure model of TCR alpha, TCR beta, YFCTCYFGLF, and MHC complex (Fig.3), is presented earlier, at section 3.2.
Fig. 4.
Sequence logos of the 10mers that are both SBs to HLA-A*24:02 and have 6 identical residues with the YFCTCYFGLF, in comparison to the motif of the HLA-A*24:02 allele. (a) is the 9mer motif of the HLA-A*24:02 allele. (b) is the logo that is generated from the YYCARRFGLF, YYCHCPFGVF, and YYCQQYFFLF sequences. (c) is the logo that is generated from the FFYTFYFELF, YFVACLFILF, and YFPTITFHLF sequences. (d) is the logo that is generated from the sequences in (b) and (c). The dimensions of the original file of the data in (a) is adjusted to comply with the dimensions of this figure.
4. Discussion
This study is performed with the reference genomes, meaning that the individual variations in the indicated sequences can influence the outcomes. Also, there are other genetic, physiological, and environmental variations that can contribute. All could be influential in the possible immune responses and autoimmune reactions. At first, the CFLGYFCTCYFGLFC sequence is identified here as the query sequence with the highest identity, through the tblastx search of SARS-CoV-2, by limiting the search to
P. vivax . This 15mer sequence is prevalently observed within the alignment results. It is the case not only in case of the search results with
P. vivax but also in case of the other plasmodium species that have been studied here (Table 2). It is mentioned in the earlier studies [76] that peptide elution established that MHC class I molecules present transmembrane helices. The 15mer sequence that is investigated here is also likely a transmembrane region, which complies with that. It is predicted as a strong binder only for the HLA-A*24:02 allele. This is also because we looked almost for a full agreement of the prediction results of several tools. Aligned sequences with gaps are eliminated in this study. Besides, aligned subject sequences that contain gaps in the other regions than the positions that align with the region of interest are also eliminated. This may well have diminished the number of sequences that are obtained. This conclusion is supported by the alignment results with the same 15mer sequence that have at least 7 residue matches, irrespective of the presence of gaps in the alignments [77]. Peptides with homologous regions and strong binding affinities to the same HLA allele such as those identified through this study possess the risk of autoimmune reaction [78–84]. Intense rise in the numbers of shared heptapeptides among the bacteria and human proteomes is reported [81]. Yet, it was shown that antibody generating pathogen proteins are not homologous to human proteins, and vice versa [85]. These indicate that mere resemblance is not the sole reason for the development of autoimmunity in most of the cases, and should cover more parameters, including environmental factors [46]. Distinguishing self from non-self is proposedly contributing [86]. To account for molecular mimicry that leads to an autoimmune disease, in addition to the presence of such a similarity of interest, presence of cross-reacting T-cells or antibodies, epidemiological link and the autoimmune disease generation, and reproducibility in an animal model, are required. Autoimmune reactions were observed in animal models accordingly (e.g., [87,88]). However, the latter two criteria are still challenging and prone to concerns [46]. Yet, it does not obscure the fact that such findings including those presented here needs to be supported by further evidence, in terms of the autoimmunity susceptibility risk. Keeping this in mind, the present work defines conditions of autoimmunity susceptibility risk of certain HLA serotypes, upon being infected with SARS-CoV-2. Implications of the work that is presented here also highlight the importance of accounting for the genetic variations in the development of vaccines. There are far more genetic variations than the HLA serotypes, which are readily extensive. Cross-reactivity between the adjuvant-vaccine and the proteome of the patient can possess the risk of autoimmune disease development in case of genetically susceptible individuals [89]. Therefore, homologies between the adjuvant vaccines and the human proteome need to be investigated [17–22], by accounting for the genetic variations. Individuals to be included in the clinical trials of the relevant vaccine types should be representative enough of the allelic variation of the HLA types. As suggested by Nguyen and co-workers [42], implementation of HLA typing into the ongoing and upcoming clinical trials and COVID-19 tests would be extremely useful. Such considerations are of importance for the tests on the animals as well, wherein the autoimmune responses occurring through similar routes would not always be indicative of the same situation in human, and vice versa [21]. In case of COVID-19, Warren and Birol [90] performed HLA class I and II alleles' prediction from the transcriptome sequencing data of bronchoalveolar lavage fluid samples of 5 COVID-19 patients from China. Interestingly, HLA class I allele A*24:02 was identified in 4 of a total number of 5 individuals, which was deemed significant [90] compared to the frequency of the respective allele in the South Han Chinese population [91]. The authors mentioned that the HLA-A*24:02 (A24 allele group) was not known to be a risk factor of SARS, but it is associated with diabetes [92–95], which is known as a risk factor for the patients of the current pandemic [96]. HLA-A*24 is even reported as "an independent predictor of 5-year progression to diabetes in autoantibody-positive first-degree relatives of Type I diabetic patients" [97]. It can be noted here that malaria is also associated with diabetes [98–100]. Within the study by Warren and Birol [90], among the HLA class II alleles, DPA1*02:02 and DPB1*05:01 were also found to be frequent (in 4 of 5 patients), and these alleles are common in Han Chinese, and associated with the autoimmune diseases Graves' [101] and narcolepsy [102]. The latter allele, DPB1*05:01, was found to be associated with chronic hepatitis B in Asians, in addition to being a risk factor for effecting viral infection clearance ability ([102,103] cited in [91]). Accordingly, predictions for binding with the class II alleles is planned as a future work. Peptide sharing events or high number of matches in peptides carry the potential risk of observing similar pathologies due to autoimmune responses to the same peptides in individuals with genetic susceptibilities. Observations of matches in peptides of different species possessing autoimmune response risk is not unexpected. It was mentioned by Kanduc [38] that the probability of a hexapeptide to occur is 1 over 64 million, which is calculated through the equal likelihood of the occurrence of 20 aa at any of the 6 positions in a hexapeptide. Proteome size of the organism, to which the blastp search is limited, is the site to realize that possibility: 1 over 64 million. An unbiased proteome with a size of about 64 million aa would be sufficient for the observation of any 6mer for once, considering that such a proteome would be comprised of 64 million different hexamers. ( Dissection of the proteome into proteins is ignored, for the sake of simplicity .) If we look at the size of the human proteome, there are 122962 proteins with a total number of 80769298 aa (data retrieved from the protein-list at NCBI, genome assembly id 582967). Therefore, size of the human proteome can potentially let every possible 6mer to be observed (at least) for once. Here, we looked for minimum 6 aa matches in a 10mer, but we observed peptides only with 6 aa matches. Expected occurrence of 6 aa matches in a 10mer is
252 times more than that of observing 6mers, wherein 252 is the number of 6-membered subsets in a set of 10 members ( each member in the set is the aa at each position of the 10mer of interest ). However, we observed only few cases with 6 aa matches, namely few of the 6-membered subsets (Table 4). So, the number was less than expected. This is because the calculated number is the expected number of observations for unbiased conditions, but the human proteome is biased, and most proteins have several isomers. These results somehow support the view by Kanduc [38], which suggests the involvement of evolutionary processes between human and SARS-CoV-2 [104], in relation to a different part of the concept that is covered here. Such an evolution of the viral proteome could be employing a selection advantage based on infectivity, or elimination by the host. For instance, proteome of
P. falciparum strain 3D7 reference genome has 5387 proteins with a total number of 4103467 aa (data retrieved from the protein-list at NCBI, genome assembly id 895506). It is about 20 times smaller than that of the human proteome. Yet, that proteome also revealed 5 peptides with 6 aa matches to the 10mer query peptide (Table 5). This can be interpreted as a less biased proteome than the human proteome, which is readily supported by the fact that the
P. falciparum proteins are not comprised of several isomers such as in case of the human proteins. Still, the observation should be validated with a large and varied query set. At the current state, it is interesting that only one of the three SB peptides of Plasmodium species listed in Table 5 is found to have a 6 aa matching peptide in the human proteome. These concepts are part of the host-pathogen evolution and the coevolution of the pathogens of the same host. They deserve further attention in the follow-up studies, which may also be contributed by the fact that 32 Plasmodium sequences are present among the 17657 sequences in the descriptions table of the blastn search of SARS-CoV-2 with somewhat similar sequences ( s55 ). In relation, what is observed through the present study is as follows: SARS-CoV-2 peptide that we investigated have similar peptides (with 6 aa matches) both in the human proteome and in the Plasmodium species' proteomes. On the contrary, only one of those Plasmodium species' peptides do have one similar peptide (with 6 aa matches) in the human proteome. In relation, shared immunodominant regions between
P. falciparum and SARS-CoV-2 were previously [6] suggested to be explaining the low incidence of COVID-19 within the malaria-endemic belt. In the current work, autoimmune reaction risk is pointed at, through such a similarity. Accordingly, the indicated low incidence could be due to higher number of homologous peptides of
P. falciparum and SARS-CoV-2, causing protection by triggering immune response, rather than an autoimmune reaction. It is supported here by the observation that only one of the three SB peptides of Plasmodium species that are homologous to the SARS-CoV-2 peptide have a homologous peptide (with 6 aa matches) in the human proteome. So, probability of the protective effect due to the shared peptides is higher than the risk of their causing an autoimmune response because a small number of those shared peptides are SB to the same HLA allele(s) and have a similar peptide in the human proteome, which is also SB to the same HLA allele(s). This can also be interpreted as follows: Host-pathogen evolution at the “malaria-endemic belt” worked in the direction of diminished autoimmune responses that involve molecular mimicry. It can eventually be protective against the severe cases of COVID-19 with autoimmune reactions involving associated mechanisms. So, there are distinct implications that can be derived from the observations that are presented here. Yet, it can safely be stated within the scope of this study that similarities carry the potential of leading to inherently related autoimmune pathologies and such similarities are expected by nature, but the physiological outcomes that are implied here by the results of the HLA binding affinity predictions are complex and require experimental and/or clinical support.
5. Conclusion
In this study, separate tblastx searches are performed with the reference genome of SARS-CoV-2, by limiting the searches to five plasmodium species that infect human. Then, presence of aligned sequences in the respective organisms' proteomes are searched with blastp. Results of those led to the 15mer peptide with the CFLGYFCTCYFGLFC sequence in the SARS-CoV-2 proteome. Its binding predictions to the MHC supertype representatives revealed a 10mer and 11mer, with strong binding affinities to the HLA-A*24:02 allele. Their homologous peptides with minimum 60 % identity and gapless alignments are identified through the results of blastp searches, in the human proteome, and in the respective plasmodium species’ proteomes. Binding affinity predictions of those peptides, to the HLA-A*24:02 allele, are performed. Three of the peptides of the human proteome are predicted, which can be considered as a risk of autoimmune response in the COVID-19 patients with the HLA-A*24:02 serotype. On the other hand, one homologous peptide in the human proteome could be identified among the plasmodium peptides that are predicted to bind strongly of the HLA-A*24:02 allele. That peptide is also predicted as7
In this study, separate tblastx searches are performed with the reference genome of SARS-CoV-2, by limiting the searches to five plasmodium species that infect human. Then, presence of aligned sequences in the respective organisms' proteomes are searched with blastp. Results of those led to the 15mer peptide with the CFLGYFCTCYFGLFC sequence in the SARS-CoV-2 proteome. Its binding predictions to the MHC supertype representatives revealed a 10mer and 11mer, with strong binding affinities to the HLA-A*24:02 allele. Their homologous peptides with minimum 60 % identity and gapless alignments are identified through the results of blastp searches, in the human proteome, and in the respective plasmodium species’ proteomes. Binding affinity predictions of those peptides, to the HLA-A*24:02 allele, are performed. Three of the peptides of the human proteome are predicted, which can be considered as a risk of autoimmune response in the COVID-19 patients with the HLA-A*24:02 serotype. On the other hand, one homologous peptide in the human proteome could be identified among the plasmodium peptides that are predicted to bind strongly of the HLA-A*24:02 allele. That peptide is also predicted as7 strong binder to the same HLA allele. Based on these results, it is suggested that SARS-CoV-2 and P. falciparum infections may trigger autoimmune responses in the individuals with the HLA-A*24:02 allele, through analogous means. These results imply autoimmunity susceptibility of HLA-A*24:02 serotypes through molecular mimicry in case of SARS-CoV-2 and
P. falciparum infections, which is yet to be supported by further studies. Such studies may eventually lead to certain HLA serotypes’ identification as risk groups; identification of novel alleles [105–107]; and such studies highlights the importance of this consideration in vaccine studies; and can inspire studies involving commonalities between pathogens of the human host, which is driven by host-pathogen evolution. As outlined nicely by Córdoba-Aguilar and co-workers [108], what may be as much important is the possibility of such studies and the lessons that are learned from the current pandemic to trigger an understanding leading to the development of disease prevention and environmental protection strategies for the well-being of not only the humanity but the nature itself, in a broader sense.
Acknowledgments:
Ecology and Evolutionary Biology Society of Turkey is acknowledged.
Supporting information:
Supplementary files available at shorturl.at/cmKP6
Conflicts of interests:
There are no conflicts of interests to declare.
Authors' contributions:
YA performed conception and design of the work; acquisition, analysis, and interpretation of the data for the work; drafting and revising the work to the final format.
References
1. Perricone C, Triggianese P, Bartoloni E, et al. The anti-viral facet of anti-rheumatic drugs: Lessons from COVID-19. J Autoimmun. 2020;111:102468.
2. Magagnoli J, Narendran S, Pereira F, Cummings TH, Hardin JW, Sutton SS, Ambati J. Outcomes of hydroxychloroquine usage in United States veterans hospitalized with Covid-19. Med. 2020;1:1–14. 3. Touret F, de Lamballerie X. Of chloroquine and COVID-19. Antiviral Res. 2020;177:104762. 4. Blanquart S, Gascuel O. Mitochondrial genes support a common origin of rodent malaria parasites and Plasmodium falciparum’s relatives infecting great apes. BMC Evol Biol. 2011;11:70. 5. Déchamps S, Maynadier M, Wein S, Gannoun-Zaki L, Maréchal E, Vial HJ. Rodent and nonrodent malaria parasites differ in their phospholipid metabolic pathways. J Lipid Res. 2010;51:81–96. 6. Iesa MAM, Osman MEM, Hassan MA, et al. SARS-CoV-2 and
Plasmodium falciparum common immunodominant regions may explain low COVID-19 incidence in the malaria-endemic belt. New Microbe and New Infect. 2020;38:100817. 7. Mehta P, Parikh P, Aggarwal S, et al. Has India met this enemy before? From an eternal optimist’s perspective: SARS-CoV-2. Indian Journal of Medical Sciences. 2020;72(1):8–12. 8. Charon J, Grigg MJ, Eden J-S, et al. Novel RNA viruses associated with
Plasmodium vivax in human malaria and
Leucocytozoon parasites in avian disease. PLoS Pathog. 2019;15(12):e1008216. 9. De Souza W. Covid-19 and parazitology. Parasitol Res. 2020; doi: 10.1007/s00436-020-06719-y. 10. Kishore R, Dhakad S, Arif N, et al. COVID-19: Possible cause of induction of relapse of
Plasmodium vivax infection. Indian J Pediatr. 2020;3:1–2. 11. Kerkar N, Vergani D. De novo autoimmune hepatitis –is this different in adults compared to children. J Autoimmun. 2018;95:26–33.
12. Metlas R, Skerl V, Veljkovic V, Colombatti A, Pongor S. Immunoglobulin-like domain of HIV-1 envelope glycoprotein gpl20 encodes putative internal image of some common human proteins. Viral Immunol. 1994;7(4):215–219. 13. Veljkovic V, Johnson E, Metlaš R. Molecular basis of the inefficacy and possible harmful effects of AIDS vaccine candidates based on HIV-1 envelope glycoprotein gp120. Vaccine. 1997;15(2):473–474. 14. Metlas R, Trajkovic D, Srdic T, Veljkovic V, Colombatti A. Human Immunodeficiency Virus V3 peptide-reactive antibodies are present in normal HIV-negative sera. AIDS Res Hum Retrovir. 1999;15(7):671–677. 15. Metlas R, Trajkovic D, Srdic T, Veljkovic V, Colombatti A. Anti-V3 and anti-IgG antibodies of healthy individuals share complementarity structures. J Acquir Immune Defic Syndr. 1999;21(4):266–270. 16. Carter CJ. Extensive viral mimicry of 22 AIDS-related autoantigens by HIV-1 proteins and pathway analysis of 561 viral/human homologues suggest an initial treatable autoimmune component of AIDS. FEMS Immunol Med Microbiol. 2011;63:254–268. 17. Kanduc D, Shoenfeld Y. From HBV to HPV: Designing vaccines for extensive and intensive vaccination campaigns worldwide. Autoimmun Rev. 2016;15:1054–1061. 18. Kanduc D, Shoenfeld Y. Inter-pathogen peptide sharing and the original antigenic sin: solving a paradox. The Open Immunology Journal. 2018;8:16–27. 19. Kanduc D, Shoenfeld Y. Human Papillomavirus epitope mimicry and autoimmunity: the molecular truth of peptide sharing. Pathobiology. 2019;86:285–295. 20. Kanduc D, Shoenfeld Y. On the molecular determinants of the SARS-CoV-2 attack. Clin Immunol. 2020;215:108426. 21. Kanduc D, Shoenfeld Y. Medical, genomic, and evolutionary aspects of the peptide sharing between pathogens, primates, and humans. Global Med Genet. 2020;7:64–67.
22. Kanduc D, Shoenfeld Y. Molecular mimicry between SARS-CoV-2 spike glycoprotein and mammalian proteomes: implications for the vaccine. Immunol Res. 2020;68:310–313. 23. Atyeo C, Fischinger S, Zohar T, et al. Distinct early serological signatures track with SARS-CoV-2 survival. Immunity. 2020;53:524–532. 24. Kaneko N, Kuo H-H, Boucau J, et al. Loss of Bcl-6-expressing T follicular helper cells and germinal centers in COVID-19. Cell. 2020;183:1–15. 25. Kuri-Cervantes L, Pampena MB, Meng V, et al. Comprehensive mapping of immune perturbations associated with severe COVID-19. Sci Immunol. 2020;5:eabd7114. 26. Laing AG, Lorenc A, del Molino del Barrio I, et al. A dynamic COVID-19 immune signature includes associations with poor prognosis. Nat Med. 2020;26:1623–1635. 27. Lucas C, Wong P, Klein J, et al. Longitudinal analyses reveal immunological misfiring in severe COVID-19. Nature. 2020;584:463–469. 28. Mathew D, Giles JR, Baxter AE, et al. Deep immune profiling of COVID-19 patients reveals distinct immunotypes with therapeutic implications. Science. 2020;369:eabc8511. 29. Woodruff MC, Ramonell RP, Cashman KS, et al. Dominant extrafollicular B cell responses in severe COVID-19 disease correlate with robust viral-specific antibody production but poor clinical outcomes. MedRxiv. 2020;2020.04.29.20083717. 30. Rodríguez Y, Novelli L, Rojas M, et al. Autoinflammatory and autoimmune conditions at the crossroad of COVID-19. J Autoimmun. 2020;114:102506. 31. Cappello F, Gammazza AM, Dieli F, de Macario EC, Macario AJL. Does SARS-CoV-2 Trigger stress-induced autoimmunity by molecular mimicry? A hypothesis. J Clin Med. 2020;9:2038. 32. Lucchese G, Flöel A. Molecular mimicry between SARS-CoV-2 and respiratory pacemaker neurons. Autoimmun Rev. 2020;19:102556.
33. Lucchese G, Flöel A. SARS-CoV-2 and Guillain-Barré syndrome: molecular mimicry with human heat shock proteins as potential pathogenic mechanism. Cell Stress Chaperones. 2020;25:731–735. 34. Angileri F, Legare S, Gammazza AM, de Macario EC, Macario AJL, Cappello F. Molecular mimicry may explain multi-organ damage in COVID-19. Autoimmun Rev. 2020;19:102591. 35. Vojdani A, Kharrazian D. Potential antigenic cross-reactivity between SARS-CoV-2 and human tissue with a possible link to an increase in autoimmune diseases. Clin Immunol. 2020;217:108480. 36. Lucchese G. Cerebrospinal fluid findings in COVID-19 indicate autoimmunity. Lancet Microbe. 2020;1:e242. 37. Lyons-Weiler J. Pathogenic priming likely contributes to serious and critical illness and mortality in COVID-19 via autoimmunity. Journal of Translational Autoimmunity. 2020;3:100051. 38. Kanduc D. From anti-SARS-CoV-2 immune responses to COVID-19 via molecular mimicry. Antibodies. 2020;9:33. 39. Lin M, Tseng H-K, Trejaut JA, et al. Association of HLA class I with severe acute respiratory syndrome coronavirus infection. BMC Med Genet. 2003;4:9. 40. Alicia S-M. HLA studies in the context of coronavirus outbreaks. Swiss Med Wkly. 2020;150:w20248. 41. Ng MHL, Lau K-M, Li L, et al. Association of human-leukocyte-antigen class I (B*0703) and class II (DRB1*0301) genotypes with susceptibility and resistance to the development of severe acute respiratory syndrome. J Infect Dis. 2004;190(3):515–518. 42. Nguyen A, David JK, Maden SK, et al. Human leukocyte antigen susceptibility map for Severe Acute Respiratory Syndrome Coronavirus 2. J Virol. 2020;94(13):e00510-20. 43. Cappello F. Is COVID-19 a proteiform disease inducing also molecular mimicry phenomena? Cell Stress Chaperones. 2020;25(3):381–382.
44. Sedaghat Z, Karimi N. Guillain Barre syndrome associated with COVID-19 infection: A case report. J Clin Neurosci. 2020;76:233–235. 45. Cappello F. COVID-19 and molecular mimicry: The Columbus’ egg? J Clin Neurosci. 2020;77:246. 46. Rojas M, Restrepo-Jiménez P, Monsalve DM, et al. Molecular mimicry and autoimmunity. J Autoimmun. 2018;95:100–123. 47. James JA, Harley JB. Linear epitope mapping of an Sm B/B′ polypeptide. J Immunol. 1992;148:2074–2079. 48. Lunardi C, Bason C, Navone R, et al. Systemic sclerosis immunoglobulin G autoantibodies bind the human cytomegalovirus late protein UL94 and induce apoptosis in human endothelial cells. Nat Med. 2000;6:1183–1186. 49. Fussey SP, Ali ST, Guest JR, James OF, Bassendine MF, Yeaman SJ. Reactivity of primary biliary cirrhosis sera with Escherichia coli dihydrolipoamide acetyltransferase (E2p): characterization of the main immunogenic region. Proc Natl Acad Sci Unit States Am. 1990;87(10):3987–3991. 50. Rivera-Correa J, Rodriguez A. Autoimmune anemia in malaria. Trends in Parasitology. 2020;36(2):91–97. 51. Junqueira C, Barbosa CRR, Costa PAC, et al. Cytotoxic CD8(+) T cells recognize and kill Plasmodium vivax-infected reticulocytes. Nat Med. 2018;24:1330–1336. 52. Mourão LC, Cardoso-Oliveira GP, Braga ÉM. Autoantibodies and malaria: where we stand? Insights into pathogenesis and protection. Front Cell Infect Microbiol. 2020;10:262. 53. MacDonald SM, Bhisutthibhan J, Shapiro TA, et al. Immune mimicry in malaria: Plasmodium falciparum secretes a functional histamine-releasing factor homolog in vitro and in vivo. Proc Natl Acad Sci Unit States Am. 2001;98(19):10829–10832. 54. Ludin P, Nilsson D, Mäser P. Genome-wide identification of molecular mimicry candidates in parasites. PLoS One. 2011;6(3):e17546.
55. Mourão LC, Baptista RP, de Almeida ZB, et al. Anti-band 3 and anti-spectrin antibodies are increased in Plasmodium vivax infection and are associated with anemia. Sci Rep. 2018;8(1):8762. 56. MacDonald KS, Fowke KR, Kimani J, et al. Influence of HLA supertypes on susceptibility and resistance to human immunodeficiency virus type 1 infection. J Infect Dis. 2000;181:1581–1589. 57. Altschul SF, Madden TL, Schäffer AA, et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25:3389–3402. 58. NCBI Resource Coordinators. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 2017;46:D8–D13. 59. Andreatta M, Nielsen M. Gapped sequence alignment using artificial neural networks: application to the MHC class I system. Bioinformatics. 2016;32(4):511–517. 60. Nielsen M, Lundegaard C, Worning P, et al. Reliable prediction of T-cell epitopes using neural networks with novel sequence representations. Protein Sci. 2003;12:1007–1017. 61. Reynisson B, Alvarez B, Paul S, Peters B, Nielsen M. NetMHCpan-4.1 and NetMHCIIpan-4.0: improved predictions of MHC antigen presentation by concurrent motif deconvolution and integration of MS MHC eluted ligand data. Nucleic Acids Res. 2020;48(W1):W449–W454. 62. Zhang H, Lund O, Nielsen M. The PickPocket method for predicting binding specificities for receptors based on receptor pocket similarities: application to MHC-peptide binding. Bioinformatics. 2009;25(10):1293–1299. 63. Karosiene E, Lundegaard C, Lund O, Nielsen M. NetMHCcons: a consensus method for the major histocompatibility complex class I predictions. Immunogenetics. 2012;64(3):177–186. 64. Stranzl T, Larsen MV, Lundegaard C, Nielsen M. NetCTLpan. Pan-specific MHC class I pathway epitope predictions. Immunogenetics. 2010;62(6):357–368.
65. Larsen MV, Lundegaard C, Lamberth K, Buus S, Lund O, Nielsen M. Large-scale validation of methods for cytotoxic T-lymphocyte epitope prediction. BMC Bioinformatics. 2007;8:424. 66. Jensen KK, Rantos V, Jappe EC, et al. TCRpMHcmodels: Structural modelling of TCR-pMHc class I complexes. Sci Rep. 2019;9:14530. 67. Berman HM, Westbrook J, Feng Z, et al. The Protein Data Bank. Nucleic Acids Res. 2000;28(1):235–242. 68. Shimizu A, Kawana-Tachikawa A, et al. Structure of TCR and antigen complexes at an immunodominant CTL epitope in HIV-1 infection. Sci Rep. 2013;3:3097. 69. Shi Y, Kawana-Tachikawa A, Gao F, et al. Conserved V delta 1 binding geometry in a setting of locus-disparate pHLA recognition by delta / alpha beta T cell receptors (TCRs): Insight into Recognition of HIV Peptides by TCRs. J Virol. 2017;91(17):e00725-17. 70. Thomsen MFC, Nielsen M. Seq2Logo: a method for construction and visualization of amino acid binding motifs and sequence profiles including sequence weighting, pseudo counts and two-sided representation of amino acid enrichment and depletion. Nucleic Acids Res. 2012;40(Web Server issue):W281–W287. 71. Rapin N, Hoof I, Lund O, Nielsen M. MHCmotif viewer. Immunogenetics. 2008;60(12):759–765. 72. Wheeler D, Bhagwat M. Chapter 9 BLAST QuickStart. In: Bergman NH, ed. Comparative Genomics. Volumes 1 and 2. Totowa NJ: Humana Press; 2007. 73. Böhme U, Otto TD, Sanders M, Newbold CI, Berriman M. Progression of the canonical reference malaria parasite genome from 2002–2019. Wellcome Open Res. 2019;4:58. 74. The UniProt Consortium. UniProt: a worldwide hub of protein knowledge. Nucleic Acids Res. 2019;47(D1):D506–D515. 75. Lundegaard C, Lamberth K, Harndahl M, Buus S, Lund O, Nielsen M, NetMHC-3.0: accurate web accessible predictions of human, mouse and monkey MHC class I affinities for peptides of length 8–11. Nucleic Acids Res. 2008;36(suppl 2):W509–W512.
76. Bianchi F, Textor J, van den Bogaart G. Transmembrane helices are an overlooked source of Major Histocompatibility Complex Class I epitopes. Front Immunol. 2017;8:1118. 77. Adiguzel Y. Molecular mimicry between SARS-CoV-2 and human proteins, Autoimmun Rev. 2021;doi: 10.1016/j.autrev.2021.102791. 78. Kohm AK, Fuller KG, Miller SD. Mimicking the way to autoimmunity: an evolving theory of sequence and structural homology. Trends Microbiol. 2003;11:101–105. 79. Lule S, Colpak AI, Balci-Peynircioglu B, et al. Behcet Disease serum is immunoreactive to neurofilament medium which share common epitopes to bacterial HSP-65, a putative trigger. J Autoimmun. 2017;84:87–96. 80. Negi S, Singh H, Mukhopadhyay A. Gut bacterial peptides with autoimmunity potential as environmental trigger for late onset complex diseases: in-silico study. PloS One. 2017;12:e0180518. 81. Trost B, Lucchese G, Stufano A, Bickis M, Kusalik A, Kanduc D. No human protein is exempt from bacterial motifs, not even one. Self Nonself. 2010;1:328–334. 82. Vellozzi C, Iqbal S, Broder K, Guillain-Barre syndrome, influenza, and influenza vaccination: the epidemiologic evidence. Clin Infect Dis. 2014;58:1149–1155. 83. Yuki N. Ganglioside mimicry and peripheral nerve disease. Muscle Nerve. 2007;35:691–711. 84. Zabriskie JB, Freimer EH, An immunological relationship between the group. A streptococcus and mammalian muscle. J Exp Med. 1966;124:661–678. 85. Amela I, Cedano J, Querol E. Pathogen proteins eliciting antibodies do not share epitopes with host proteins: a bioinformatics approach. PLoS One. 2007;2(6):e512. 86. Matzinger P. The danger model: a renewed sense of self. Science. 2002;296:301–305. 87. Fujinami RS, Oldstone MB, Wroblewska Z, Frankel ME, Koprowski H. Molecular mimicry in virus infection: crossreaction of measles virus phosphoprotein or of herpes simplex virus protein with human intermediate filaments. Proc Natl Acad Sci Unit States Am. 1983;80:2346–2350. fi c CD8 T cells restricted by the susceptibility molecule HLA-A24 are expanded at onset of type 1 diabetes and kill β-cells. Diabetes. 2012;61:1752–1759. 94. Nakanishi K, Inoko H. Combination of HLA-A24, -DQA1*03, and -DR9 contributes to acute-onset and early complete beta-cell destruction in type 1 diabetes: longitudinal study of residual beta-cell function. Diabetes. 2006;55:1862–1868. 95. Noble JA, Valdes AM, Bugawan TL, Apple RJ, Thomson G, Erlich HA. The HLA class I A locus affects susceptibility to type 1 diabetes. Human Immunol. 2002;63(8):657–664. 96. Guan W-J, Liang W-H, Zhao Y, et al. Comorbidity and its impact on 1590 patients with Covid-19 in China: A Nationwide Analysis. Eur Respir J. 2020;55(5):2000547. 97. Mbunwe E, Van der Auwera BJ, Vermeulen I, et al. HLA-A*24 is an independent predictor of 5-year progression to diabetes in autoantibody-positive first-degree relatives of Type 1 diabetic patients. Diabetes. 2013;62(4):1345–1350.7