Prediction of potential commercially inhibitors against SARS-CoV-2 by multi-task deep model
PPrediction of potential commercially inhibitors against SARS-CoV-2 by multi-task deep model
Fan Hu , Jiaxin Jiang , Peng Yin Joint Engineering Research Center for Health Big Data Intelligent Analysis Technology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China *To whom correspondence should be addressed
Abstract
The outbreak of novel coronavirus pneumonia (COVID-19) caused thousands of deaths worldwide, and the number of total infections is still rising. However, the development of effective vaccine for this novel virus would take a few months. Thus it is urgent to identify some potentially effective old drugs that can be used immediately. Fortunately, some compounds that can inhibit coronavirus in vitro have been reported. In this study, the coronavirus-specific dataset was used to fine-tune our pre-trained multi-task deep model. Next we used the re-trained model to select available commercial drugs against targeted proteins of SARS-CoV-2. The results show that abacavir, a powerful nucleoside analog reverse transcriptase inhibitor used to treat HIV, is predicted to have high binding affinity with several proteins of SARS-CoV-2. Almitrine mesylate and roflumilast which are used for respiratory diseases such as chronic obstructive pulmonary disease are also predicted to have inhibitory effect. Overall, ten drugs are listed as potential inhibitors and the important sites for these binding by our model are exhibited. We hope these results would be useful in the fight against SARS-CoV-2.
Introduction
Since December 2019, cases with pneumonia of unknown cause have been reported in Wuhan, China[1]. A novel coronavirus named SARS-CoV-2 was isolated from the infections, which is the seventh member of the family of coronaviruses that infect humans[1, 2]. Similar to MERS-CoV and SARS-CoV, SARS-CoV-2 causes severe respiratory diseases and is capable of spreading from person to person. As of March 2 2020, there have been more than 87,000 confirmed SARS-CoV-2 infections worldwide, including more than 2900 deaths. Unfortunately, the development of effective vaccine against such novel virus would take a few months. Thus it is urgent to find some effective approved drugs that can be used quickly, which is “drug-repurposing”. As a positive-sense, single-stranded RNA beta-coronavirus, SARS-CoV-2 encodes structural, on-structural and accessory proteins. Among these, RNA-dependent RNA polymerase (RdRp), 3-chymotrypsin-like (3CL) protease, papain-like protease, helicase and spike glycoprotein are supposed to be the main targets. Several compounds that targeted these viral proteins and inhibited coronavirus in vitro have been reported and moved into clinical trials[3]. For example, remdesivir is an approved HIV reverse transcriptase inhibitor, which has broad-spectrum activities against RNA viruses such as MERS-CoV and SARS-CoV, but showed less effective in a Ebola clinical trial[4–6]. Recent reports showed that remdesivir inhibited SARS-CoV-2 in vitro with EC50=0.77μM[7], and was used to treat a SARS-CoV-2 infected patient in the United States[8]. The clinical trial of remdesivir for SARS-CoV-2 is currently underway in China. Obviously, more potential inhibitors against SARS-CoV-2 are still needed. Computational methods for screening potential positive compounds to target protein actually improve the success rates of drug discovery. Recently, methods based on deep learning have gained impressive performance on protein-ligand binding prediction[9–11]. The main advantage of this algorithm is that it can extract useful features automatically from the raw data during the process of training. But the generalization of deep learning model is also limited by the lack of specific data. Importantly, the reported virus inhibitors can be used to readjusting the model so it would have higher accuracy. In this study, our pre-trained multi-task deep model is used to predict possible drugs against SARS-CoV-2. First, we fine-tune our previous model by a coronavirus specific dataset. Then a dataset contained 4895 commercially drugs is used to screen against eight potential protein targets of SARS-CoV-2 using the re-trained model. Drugs with high predicted affinities are listed as the potential inhibitors. Besides, the important sites for binding by our model also exhibited. We hope these predictions would be useful in the fight against SARS-CoV-2.
Methods
Data
Total eight viral proteins of SARS-CoV-2 are used as potential targets, including RNA-dependent RNA polymerase (RdRp), 3-chymotrypsin-like (3CL) protease, papain-like protease, helicase, spike glycoprotein, exonuclease, endoRNAse, 2'-O-ribose methyltransferase and envelope protein. The amino acid sequences of these proteins are extracted from NCBI (NC_045512.2). The virus-specific dataset is achieved from GHDDI with filtration (https://ghddi-ailab.github.io/Targeting2019-nCoV). Only samples with exact binding target and affinity are saved. The commercially drug dataset contained 4895 samples is used to screen potential inhibitors for SARS-CoV-2. All data that present in the training and fine-tune datasets are excluded. odel and fine-tune
Basically, our model consists of two parts: shared layers and task-specific layers. The shared layers are designed to learn a joint representation for all tasks. Task-specific layers use the joint representation to learn the weights of specific blocks based on specific tasks. In this study, two related tasks are defined: binary classification and regression. More details would be added in the next version. This model was previously trained by large amounts of data from various heterogeneous protein-ligand datasets (unpublished). In this study the model is fine-tuned by virus-specific dataset to acquire robust results for coronavirus. After being re-trained, the model predicts the possible binding between commercially drugs and protein targets. The output of the model is the score to estimate binding affinity (pKa) between drug and target, the higher score the stronger binding. Drugs with high predicted affinities are listed as the potential inhibitors.
Binding sites prediction
A non-parametric method “occlusion” used in our previous study (unpublished) is applied to explore which parts of the input sequences are critical to the task. Briefly, s i from test samples (i = 0, 1, 2, ..., n-1, here n is sample size of test set) is expressed as tuple (protein input_i, compound input_i). While maintaining compound input_i unchanged, we systematically mask the protein input_i in s i to track the changes of the output. Then the importance of each sub-sequence in the sequence to the prediction can be calculated. Results and discussion
Potential inhibitors
Recently, several SARS-CoV-2 inhibitors have been reported[3, 7]. There have been more than 16 registered clinical trials with drugs against COVID-19, as of Feb.18 2020. Among these, lopinavir, ritonavir, methylprednisolone, favipiravir and umifenovir (arbidol) have been used in multiple trials. Chloroquine phosphate, azvudine and remdesivir are used in the most recent trials. Favipiravir, ribavirin and remdesivir are supposed to target to viral RdRp. Disulfiram inhibits papain-like protease whereas lopinavir and ritonavir inhibit 3CL protease. Obviously, RdRp, 3CL protease and papain-like protease are three main targets of coronavirus. In this study, after model prediction, top 10 potential drugs of each target are selected, then the drugs that may have strong side effect or are not available are excluded. After manual selection, total 10 drugs with high probability of inhibition are listed in table 1. Among these, abacavir sulfate), a powerful nucleoside analog reverse transcriptase inhibitor used to treat HIV, is predicted to have high binding affinity with multiple proteins of SARS-CoV-2 including RdRp, 3CL protease, papain-like protease and helicase. Darunavir, a protease inhibitor used to treat HIV, was used in a clinical trial against COVID-19 (ChiCTR2000029541). It should be noted that, both darunavir and darunavir (ethanolate) are not present in training set but only in test set. Our model predicted that darunavir can target to 3CL protease, RdRp and papain-like protease with affinity K d = 57.30, 6.09 and 46.16 nM respectively. While darunavir (ethanolate) binds to 3CL protease, RdRp and papain-like protease with affinity K d = 44.51, 4.73 and 35.86 nM respectively. These results partially prove the accuracy of our model. In our predictions, almitrine mesylate, which is a respiratory stimulant that enhances respiration, is used in the treatment of chronic obstructive pulmonary disease. While roflumilast has anti-inflammatory effects and is used as an orally administered drug for the treatment of inflammatory conditions of the lungs such as chronic obstructive pulmonary disease. These two predicted drugs are associated with respiratory symptoms that are the main clinical symptoms of COVID-19. Interestingly, kesuting syrup and keqing capsule were used in a trial for treatment of mild and moderate COVID-19 (ChiCTR2000029991). It is uncertain whether these drugs only help to alleviate clinical symptoms or have direct effect on virus. Daclatasvir is used against Hepatitis C Virus, which stops HCV viral RNA replication and protein translation by directly inhibiting HCV protein NS5A. In this study, the predicted binding affinity between daclatasvir and RdRp is 15.03nM. Fiboflapon sodium, a high affinity 5-lipoxygenase-activating protein inhibitor used for the treatment of asthma, is predicted to have potential affinity to Papain-like protease with K d = 197.63nM. Table 1. The potential inhibitors for
SARS-CoV-2
Drug CAS Target Affinity(nM)
Abacavir (sulfate) 188062-50-2
Darunavir 206361-99-1
Darunavir (Ethanolate) 635728-49-3
Itraconazole 84625-61-6
Papain-like protease RdRp 127.98 16.90
Almitrine mesylate 29608-49-9
Daclatasvir 1009119-64-5
RdRp 15.03
Daclatasvir (dihydrochloride) 1009119-65-6
RdRp 19.87
Metoprolol tartrate 56392-17-7
Papain-like protease 153.23 iboflapon sodium 1196070-26-4
Papain-like protease 197.63
Roflumilast 162401-32-3
Biological interpretation
To explore how the model processes the biological data, we conduct a method to identify the key amino acids that are important for binding. The listed potential inhibitors are regarded as positive samples with high prediction score. Then we mask sub-sequences of sample to get a “masked” prediction score, and thus calculate the importance of the masked sub-sequences. As shown in Fig.1, the critical parts for binding in protein sequences are visualized through heat-map. For 3C-like proteinase, the important amino acids for binding present at three main locations. It should be noted that different drugs result in different weights of these three regions. For example, roflumilast has higher weights in the first two regions, which indicates the binding sites for roflumilast are close to the middle pocket of 3C-like proteinase. As for abacavir (sulfate), the three regions more or less affect the binding, especially the second region at 180-200th amino acids. For papain-like protease, the predicted binding sites locate at 100-120th amino acid positions. And regions at 500-584th amino acids are predicted to be important binding sites for RdRp.
Fig.1. The predicted binding sites of protein sequences. The corresponding protein: (a) 3C-like proteinase; (b) Papain-like protease; (c) RdRp. (The abscissa axis is the length of protein sequences.)
Moreover, we visualize the predicted binding sites in 3D structures to show possible pockets. The 3C-like proteinase of SARS-CoV-2 (PDB: 6lu7) released on 5 February 2020 is used. The 3D structures of papain-like protease and RdRp of SARS-CoV-2 are unavailable, thus the homology models of these two proteins are used in this study[12, 13]. As mentioned above, three regions ontribute to the binding. As shown in Fig.2 (a), the region in the middle part (180-200th amino acids) is the main pocket because of high weight in most predictions by our model. The papain-like protease of SARS-CoV-2 is highly homologous to papain-like protease of SARS. Stoermer used a SARS crystal structure template (PDB:5tl6) to prepare homology model of SARS-CoV-2 papain-like protease, with a catalytic triad composed of Cys114-His275-Asp289 and a conventional zinc-binding domain of 4 cysteine residues Cys192, Cys194, Cys227, Cys229[12]. In his study, the central “thumb” domain contributes the catalytic Cys112 to the active site. Similarly, our model predicts that 100-120th amino acids contribute mainly to the final binding of small molecules. As for RdRp, two possible pockets are also shown in Fig.2 (c).
Fig.2. Visualization of predicted important sites (a) 3C-like proteinase (6lu7); (b) homology model of SARS-CoV-2 papain-like protease from SARS (5tl6)[12]; (c) homology model of SARS-CoV-2 RdRp from SARS (6nur)[13]. Red indicates the predicted important sites and black indicates possible pocket.
Conclusion
In this work, a multi-task model is used to predict potential commercially drugs against SARS-CoV-2. We fine-tune the pre-trained model using some reported samples for coronavirus. After manual selection, 10 drugs are finally listed as potential SARS-CoV-2 inhibitors. Nowadays several drugs have been reported to inhibit SARS-CoV-2 in vitro and moved into clinical trials against COVID-19. But more available drugs are still needed, taking into account with factors including therapeutic effect, side effect, drug cost and drug production rate. One common problem for deep model is the lacking of interpretability, especially for biological problem. So we exhibit the predicted binding sites, which are thought to be important for binding by our model. Unfortunately, the number of global infections is still rising, the COVID-19 crisis grow into a worldwide emergency. We hope the prediction by our model would be useful in this fight.
Acknowledgement
This work was supported by the National Natural Science Foundation of China (NO. 11801542), the Shenzhen Fundamental Research Projects (JCYJ20170818164014753, JCYJ20170818163445670 and JCYJ20180703145002040) and Major Projects from General Logistics Department of People's Liberation Army (AWS13C008).
Reference [1] N. Zhu et al. , “A Novel Coronavirus from Patients with Pneumonia in China, 2019,”
N. Engl. J. Med. , p. NEJMoa2001017, Jan. 2020. [2] A. E. Gorbalenya, “Severe acute respiratory syndrome-related coronavirus – The species and its viruses, a statement of the Coronavirus Study Group,” bioRxiv , p. 2020.02.07.937862, 2020. [3] G. Li and E. De Clercq, “Therapeutic options for the 2019 novel coronavirus (2019-nCoV),”
Nat. Rev. Drug Discov. , Feb. 2020. [4] A. J. Brown et al. , “Broad spectrum antiviral remdesivir inhibits human endemic and zoonotic deltacoronaviruses with a highly divergent RNA dependent RNA polymerase,”
Antiviral Res. , vol. 169, p. 104541, Sep. 2019. [5] T. P. Sheahan et al. , “Comparative therapeutic efficacy of remdesivir and combination lopinavir, ritonavir, and interferon beta against MERS-CoV,”
Nat. Commun. , vol. 11, no. 1, p. 222, Dec. 2020. [6] E. Tchesnokov, J. Feng, D. Porter, and M. Götte, “Mechanism of Inhibition of Ebola Virus RNA-Dependent RNA Polymerase by Remdesivir,”
Viruses , vol. 11, no. 4, p. 326, Apr. 2019. [7] M. Wang et al. , “Remdesivir and chloroquine effectively inhibit the recently emerged novel coronavirus (2019-nCoV) in vitro,”
Cell Res. , Feb. 2020. [8] M. L. Holshue et al. , “First Case of 2019 Novel Coronavirus in the United States,”
N. Engl. J. Med. , p. NEJMoa2001191, Jan. 2020. [9] I. Wallach, M. Dzamba, and A. Heifets, “AtomNet: A Deep Convolutional Neural Network for Bioactivity Prediction in Structure-based Drug Discovery,”
Data Min. Knowl. Discov. , vol. 22, o. 1–2, pp. 31–72, Oct. 2015. [10] M. M. Stepniewska-Dziubinska, P. Zielenkiewicz, and P. Siedlecki, “Development and evaluation of a deep learning model for protein–ligand binding affinity prediction,”
Bioinformatics , vol. 34, no. 21, pp. 3666–3674, Nov. 2018. [11] M. Tsubaki, K. Tomii, and J. Sese, “Compound-protein Interaction Prediction with End-to-end Learning of Neural Networks for Graphs and Sequences,”
Bioinformatics , no. July, pp. 1–10, 2018. [12] M. J. Stoermer, “Homology Models of the Papain-Like Protease PLpro from Coronavirus 2019-nCoV,”
ChemRxiv , 2020. [13] Y. Chang et al. , “Potential therapeutic agents for COVID-19 based on the analysis of protease and RNA polymerase docking,” , 2020., 2020.