Predicting potential drug targets and repurposable drugs for COVID-19 via a deep generative model for graphs
Sumanta Ray, Snehalika Lall, Anirban Mukhopadhyay, Sanghamitra Bandyopadhyay, Alexander Schönhuth
PPredicting potential drug targets and repurposabledrugs for COVID-19 via a deep generative model forgraphs
Sumanta Ray , Snehalika Lall , Anirban Mukhopadhyay , SanghamitraBandyopadhyay , and Alexander Sch ¨onhuth Centrum Wiskunde & Informatica, Science Park 123, 1098 XG Amsterdam, The Netherlands Machine Intelligence Unit, Indian Statistical Institute, Kolkata, India. Department of Computer Science and Engineering, University of Kalynai, Kalyani, India Genome Data Science, Bielefeld University, Bielefeld, Germany * [email protected] + these authors contributed equally to this work ABSTRACT
Coronavirus Disease 2019 (COVID-19) has been creating a worldwide pandemic situation. Repurposing drugs, already shownto be free of harmful side effects, for the treatment of COVID-19 patients is an important option in launching novel therapeuticstrategies. Therefore, reliable molecule interaction data are a crucial basis, where drug-/protein-protein interaction networksestablish invaluable, year-long carefully curated data resources. However, these resources have not yet been systematicallyexploited using high-performance artificial intelligence approaches. Here, we combine three networks, two of which areyear-long curated, and one of which, on SARS-CoV-2-human host-virus protein interactions, was published only most recently(30th of April 2020), raising a novel network that puts drugs, human and virus proteins into mutual context. We apply VariationalGraph AutoEncoders (VGAEs), representing most advanced deep learning based methodology for the analysis of data that aresubject to network constraints. Reliable simulations confirm that we operate at utmost accuracy in terms of predicting missinglinks. We then predict hitherto unknown links between drugs and human proteins against which virus proteins preferably bind.The corresponding therapeutic agents present splendid starting points for exploring novel host-directed therapy (HDT) options.
Introduction
The pandemic of COVID-19 (Coronavirus Disease-2019) has affected more than 6 million people. So far, it has caused about0.4 million deaths in over 200 countries worldwide (https://coronavirus.jhu.edu/map.html), with numbers still increasing rapidly.COVID-19 is an acute respiratory disease caused by a highly virulent and contagious novel coronavirus strain, SARS-CoV-2,which is an enveloped, single-stranded RNA virus . Sensing the urgency, researchers have been relentlessly searching forpossible therapeutic strategies in the last few weeks, so as to control the rapid spread.In their quest, drug repurposing establishes one of the most relevant options, where drugs that have been approved (at leastpreclinically) for fighting other diseases, are screened for their possible alternative use against the disease of interest, which isCOVID-19 here. Because they were shown to lack severe side effects before, risks in the immediate application of repurposeddrugs are limited. In comparison with de novo drug design, repurposing drugs offers various advantages. Most importantly, thereduced time frame in development suits the urgency of the situation in general. Furthermore, most recent, and most advancedartificial intelligence (AI) approaches have boosted drug repurposing in terms of throughput and accuracy enormously. Finally,it is important to understand that the 3D structures of the majority of viral proteins have remained largely unknown, whichraises the puts up the obstacles for direct approaches to work even higher.The foundation of AI based drug repurposing are molecule interaction data, optimally reflecting how drugs, viral and hostproteins get into contact with each other. During the life cycle of a virus, the viral proteins interact with various human proteinsin the infected cells. Through these interactions, the virus hijacks the host cell machinery for replication, thereby affectingthe normal function of the proteins it interacts with. To develop suitable therapeutic strategies and design antiviral drugs, acomprehensive understanding of the interactions between viral and human proteins is essential .When watching out for drugs that can be repurposed to fight the virus, one has to realize that targeting single virus proteinseasily leads to the viruses escaping the (rather simpleminded) attack by raising resistance-inducing mutations. Therefore, host- a r X i v : . [ q - b i o . M N ] J u l irected therapies (HDT), which target human proteins that represent important carriers for the virus to enter and manipulatethe human cells, offer an important supplementary strategy . Unlike strategies that directly target the proteins of the virus, HDTare thought to be less prone to developing resistance, because human proteins are less affected by mutations, and thereforerepresent more sustainable drug targets. For establishing HDT, one has to identify proteins that are crucial for maintenance andperseverance of the disease causing virus in the human cells. Once such proteins are targeted, the replication machinery of thevirus falls apart.For all these reasons, repurposing drugs for HDT against COVID-19 has great potential. Moreover, it provides hope forrapid practical implementation because of the lack of side effects.Because the basis for drug repurposing screens are molecule interaction data, biological interaction networks offer invaluableresources, because they have been carefully curated and steadily refined for many years. This immediately points out thatnetwork based repurposing screens offer unprecedented opportunities in revealing targets for HDT’s, which explains the recentpopularity of such approaches in general . Evidence for the opportunities is further provided by successful approaches basedon viral-host networks in particular , having led to therapy options for treating Dengue , HIV , Hepatitis C and Ebola .Beyond fighting viruses, treatments for various other diseases have been developed .Since the outbreak of COVID-19, a handful of research groups have been trying to exploit network resources, developingnetwork algorithms for the discovery of drugs that can be repurposed to act against SARS-CoV-2. The first attempt wasmade by Zhou et al. through an integrative network analysis, followed by Li et al. , who combined network data with acomparative analysis on the gene sequences of different viruses to obtain drugs that can potentially be repurposed to act againstSARS-CoV-2.Shortly thereafter, Gordon et al. conducted pioneering work by generating a map that juxtaposes SARS-CoV-2 proteinswith human proteins that were found to interact in affinity-purification mass spectrometry (AP-MS) screens. Furthermore, inindependent work, Dick et al. identified high confidence interactions between human proteins and SARS-CoV-2 proteinsusing sequence-based PPI predictors (a.k.a. PIPE4 & SPRINT).Both of the studies on SARS-CoV-2 provide data that had so far been urgently missing in the fight against COVID-19.Only now, we are able to link existing (long term curated and highly reliable) drug-protein and human protein-protein interactiondata with proteins of SARS-CoV-2. In other words, only now we can draw links between the molecular agents of the virus andexisting drugs on a scale that is sufficiently large to allow for systematic high throughput repurposing screens. Still, of course,it remains to design, develop and implement the corresponding strategies in order to exploit the now decisively augmentedresources.As above-mentioned, in order to exploit resources to a maximum, one optimally makes use of sufficiently advanced AItechniques. In this work, to the best of our knowledge for the first time, we combine all arguments raised above. (1) We link existing high-quality, long-term curated and refined, large scale drug/protein - protein interaction data with (2) molecular interaction data on SARS-CoV-2 itself, raised only a handful of weeks ago, (3) exploit the resulting overarching network using most advanced, AI boosted techniques (4) for repurposing drugs in the fight against SARS-CoV-2 (5) in the frame of HDT based strategies.As for (3)-(5), we will highlight interactions between SARS-Cov-2-host protein and human proteins important for the virusto persist using most advanced deep learning techniques that cater to exploiting network data. We are convinced that many ofthe fairly broad spectrum of drugs we raise will be amenable to developing successful HDT’s against COVID-19.
Results
In the following, we will first describe the workflow of our analysis pipeline and the basic ideas that support it.We proceed by carrying out a simulation study that proves that our pipeline accurately predicts missing links in theencompassing drug - human protein - SARS-CoV-2-protein network that we raise and analyze. Namely we demonstrate thatour (high-performance, AI supported) prediction pipeline accurately re-establishes links that had been explicitly removedbefore. This provides sound evidence that the interactions that we predict in the full network most likely reflect true interactionsbetween molecular interfaces.Subsequently, we continue with the core experiments. We predict links to be missing in the full (without artificially havingremoved links), encompassing drug - human protein - SARS-CoV-2-protein network, raised by combining links from year-longcurated resources on the one hand and most recently published COVID-19 resources on the other hand. As per our simulationstudy, a large fraction, if not the vast majority of the predictions establish true, hence actionable interactions between drugs onthe one hand and SARS-CoV-2 associated human proteins (hence of use in HDT) on the other hand. od2Vec
VGAE
Feature Matrix(F) A d j a c e n c y m a t r i x ( A ) predictions SARS-CoV2-host PPI human PPIdrug-target network Integrated network Low dimensionalnode embeddings
N ( μ , σ ) ENCODERGraph Convolution Network(GCN) Latent Variable (Z) DECODER Reconstructed Adjacency matrix (A')AF d r ug C o V − ho s t h o s t _ p r o t e i n SA R S − C o v A B CD
Figure 1.
Overall workflow of the proposed method: The three networks SARS-CoV-2-host PPI, human PPI, and drug-target network(Panel-A) are mapped by their common interactors to form an integrated representation (Panel-B). The neighborhood sampling strategyNode2Vec converts the network into fixed-size low dimensional representations that perverse the properties of the nodes belonging to thethree major components of the integrated network (Panel-C). The resulting feature matrix (F) from the node embeddings and adjacencymatrix (A) from the integrated network are used to train a VGAE model, which is then used for prediction (panel-D).
For the purposes of high-confidence validation, we carry out a literature study on the overall 92 drugs we put forward. Forthis, we inspect the postulated mechanism-of-action of the drugs in the frame of several diseases, including SARS-CoV andMERS-CoV driven diseases in particular.
Workflow
See Figure 1 for the workflow of our analysis pipeline and the basic ideas that support it. We will describe all important steps inthe paragraphs of this subsection.
Raising a Comprehensive Interaction Network.
See A & B in Figure 1. We have combined three interaction networks, twoof which represent year-long curated and much refined publicly available resources, namely drug-gene interaction and thehuman interactome, together compiled from eight different, reliable and publicly accessible sources, and one of which theSARS-CoV-2–human protein-protein-interaction (PPI) network was published only a few weeks ago. The integrated networkhas four types of nodes: ) SARS-CoV-2 proteins, SARS-CoV-2-associated host proteins (CoV-host), human proteins other than 2) and drugs.This means that we put drugs, human proteins and SARS-CoV-2 proteins into mutual context via the links provided by thisnetwork. However, because the encompassing network is built from three subnetworks, links between nodes from differentindividual subnetworks are presumably missing. It remains to predict them using an AI approach, preferably of utmostperformance. This AI approach needs to identify links across the individual subnetworks. Because new cross-subnetwork linksmay imply links within the established networks as a consequence, the AI approach should also predict new links within theindividual parts of our network, if this is called for. AI Model First Stage: Node2Vec
See C in Figure 1. To bring the link prediction machinery into effect, we raise a model thatoperates in two stages. First, we employ a network embedding strategy (here: Node2Vec ), which extracts node features fromthe integrated network. In a bit more formal detail, Node2Vec converts the adjacency matrix that represents the network into afixed-size, low-dimensional latent space the elements of which are the feature vectors of the nodes. Thereby, Node2Vec aims atpreserving the properties of the nodes relative to their surroundings in the network. For efficiency reasons, Node2Vec makesuse of a sampling strategy. The result of this step is a feature matrix ( F ) where rows refer to nodes and columns refer to theinferred network features. AI Model Second Stage: Variational Graph Autoencoders (VGAE).
See B, C & D in Figure 1. In the next step, we employvariational graph autoencoders (VGAE), as a most recent, graph neural network based technique shown to be of utmost accuracy,to predict links in networks that although missing, are highly likely to exist . VGAE’s require the original graph, providedby its adjacency matrix A and, optionally, a feature matrix that annotates the nodes of the network with helpful additionalinformation. Often, F does not necessarily refer to the topology of the network itself. Here, however, we do make use of thefeature matrix F that was inferred from A itself in the first step. We found that, despite just being an alternative representationof A , using F aided in raising prediction accuracy substantially. This may not be surprising, however, because F consists ofknowledge obtained using Node2Vec, which, as being complementary to VGAE’s from a methodical point of view, cannotnecessarily revealed by VGAE’s itself. Predicting Missing Links.
See D in Figure 1. After training the VGAE, we finally predict links in the encompassingdrug-human-virus interaction network that had remained to be missing. For this, we make use of the decoding part of theVGAE, which re-raises the network based on the latent representation the network provided by the encoder. Re-raising thenetwork results in edges between nodes that although not having been explicit before, are imperative to exist relative to theencoded version of the network. Thereby, one predicts links between drugs and SARS-CoV-2-associated human proteins inparticular. Although these links had not been explicit elements of the drug-human interaction subnetwork before, their existenceis implied by the topological constraints the comprehensive network imposes. Thus, our model predicts both drugs and proteins:repurposing these drugs leads to them targeting the matching proteins. See Figure 1 for the total workflow we just described.
Addressing Computation Time.
To reduce the computation time, we used a fast version of VGAE’s as proposed by Salha etal., . This fast version relies on a strategy by which to sample nodes. Using several non-overlapping test sets for differentnumbers of sampling nodes, we evaluated the model based on sampling 5000 nodes as most suitable for our prediction task. Validation of the Learned Model
Let G = ( V , E ) be the entire drug-human-virus interaction network in the following, where nodes V represent drugs or proteinsand edges E represent interactions between them. We run Node2Vec on G to obtain a feature matrix F where rows can beidentfied with elements from V and columns represent features extracted from G .After having computed F , we encode the network G into the embedding space using variational graph autoencoder (VGAE)techniques. In detail, we use the FastGAE model as a version that decisively speeds up the learning process in the decodingphase as a major argument. For encoding, FastGAE utilizes a (popular) Graph Convolutional Network (GCN) encoder. Thisleads to encoding all nodes into the latent variable based embedding space. Therefore, FastGAE makes use of the original graphadjacency matrix A , which codes the original topology of the graph, and F . Using FastGAE is explained by the fact that in thedecoding phase, a new version of the adjacency matrix has to be established, where A is huge ( N × N , where N is the numberof all drugs, human proteins and SARS-CoV-2 proteins together), which slows down less rapid implementations of VGAE’sdecisively. To sort out this issue, FastGAE randomly samples subgraphs G S , referring to smaller sets of nodes S ⊂ V of size N S , and reconstructs the corresponding submatrices A S in several iterations, each of which refers to a different S . Finally, thesubmatrices of all samples are combined into an overarching matrix ˜ A , as an approximation of the matrix that gets reconstructedas a whole in the decoding phase of slow approaches. Note that the approximation was shown to be highly accurate . his reduces the training time compared to the general graph autoencoder model. We tested the model performance fora different number of sampled nodes, keeping track of the area under the ROC curve (AUC), average precision (AP) score,and model training time in the frame of a train-validation-test split at proportions 8:1:1. Table 1 shows the performance of themodel for sampled sugraph sizes N S = 7000, 5000, 3000, 2500 and 1000. For 5000 sampled nodes, the model’s performance issufficiently good enough concerning its training time and validation-AUC and -AP score. The average test ROC-AUC and APscore of the model for N s =5000 are 88 . ± .
03 and 84 . ± . N s =5000) on an incomplete version of the graph where the links between CoV-host and drugs have been removed.We further compute the feature matrix F based on the incomplete graph, and use it. The test set consists of all the previouslyremoved edges. The model performance is no doubt better for discovering those edges between CoV-host and drug nodes(ROC-AUC: 93 . ± .
01 AP: 90 . ± .
02 for 100 runs).The FastGAE model is learned with the feature matrix ( F ) and adjacency matrix ( A ). The node feature matrix ( F ) isobtained from A using the Node2Vec neighborhood sampling strategy. The model performance is evaluated with and withoutusing F as feature matrix. Figure 2 shows the average performance of the model on validation sets with and without F as inputfor the different number of sampling nodes. We calculate average AUC, and AP scores for 50 complete runs of the model.From figure 2, it is evident that including F as feature matrix enhances the model’s performance markedly. Table 1.
Performance of the FastGAE model for different sampling nodes ( N s ): mean validation-AUC and -AP scores is computed over last10 epoch in each run with a train-validation-test split of 8:1:1. Performance is reported over 100 runs for N s =7000, 5000, 3000, 2500 and 1000 N s Average Performance on Validation SetAUC (%) AP (%) Training Time (in sec)7000 89.21 ± ± ± ± ± ± ± ± ± ± Figure 2.
Performance of the model in validation set with and without using feature matrix (F) snode A UC A snode AP B method model_withAmodel_withA&F Graph Embedding of the Compiled Network
We use the Node2Vec framework to learn low dimensional embeddings of each node in the compiled network. It uses theSkipgram algorithm of the word2vec model to learn the embeddings, which eventually groups nodes with a similar ‘role’ orhaving a similar ‘connection pattern’ within the graph. Similar ‘role’ ensures that nodes within the sets/groups are structurallysimilar/equivalent than the other nodes outside the groups. Two nodes are said to be structurally equivalent if they have identicalconnection patterns to the rest of the network . To explore this, we have analyzed the embedding results in two steps. First,we explore structurally equivalent nodes to identify ’roles’ and similar connection patterns to the rest of the networks, andlater use Lovain clustering to examine the same within the groups/clusters. The (cid:48) most _ similar (cid:48) function of the Node2Vecinspects the structurally equivalent nodes within the network. We find out all the CoV-host nodes which are most similar to thedrug nodes. While it is expected to observe nodes of the same types within the neighborhood of a particular node, in somecases, we found some drugs are neighbors of CoV-host proteins with high probability ( pobs > pobs : 0.71, 0.69, 0.67, 0.68 and 0.68, respectively, figure 3, panel-C) have a well-known effect onthese diseases. Betulinic acid has antiretroviral, antimalarial, and anti-inflammatory properties and acts as the inhibitors of rug nodecov2 ā ssociated hosthost ptotein drug nodecov2-associated host NspNMOrfSpikeE nsp8_YWHAGorf8_PLEKHF2 nsp10_ERGIC1Spike_GOLGA7 nsp7_SNX18orf9b_CHMP2A nsp7_SLA2N_PABPC1 nsp12_PDZD11p8_EXOSC8 E_STUB1
Phenanthridinone nsp8_YWHAB
DL-PPMP
E_PSMB7 nsp7_LCKE_PSMB5 nsp9_ZNF503nsp14_PPP1CA nsp8_MRPS5nsp5_HDAC2 nsp8_SEPSECS
Trichostatin-A nsp8_MRPS25sp14_SLC25A5 nsp8_ATE1nsp7_SELENOS
Cytochalasin B nsp13_CEP43 nsp12_TYSND1N_MOV10M_SLC30A7M_ANO6 nsp8_mob1ansp8_sumo3nsp8_casp8M_TRAM1L1
Demecolcine
M_PYHIN1nsp4_TIMM29nsp4_ALG11nsp8_SUMO4nsp14_EEF1A1
FlecainideGenisteinHalcinonide n s p14_ EE F A s p8_ EX O S C s p4_ A L G s p13_ G R I PAP M _ I N T S s p8_ M R PS r f A R L6 I P s p8_ m ob1an s p7_ SE L E N O S o r f c _ ND F I P s p12_ T YS ND s p10_ E R G I C s p12_ S L U N _ C S N K A s p7_ R AB C n s p13_ C EP s p8_ H E C T D s p9_ M I B s p14_ppp1 c b sodium channel blockertyrosine kinase inhibitorglucocorticoid receptor agonist node_similarity p _ Cytochalasin B probabilty
FlecainideGenisteinHalcinonideFludrocortisone − acetateClenbuterolFenbendazoleFluphenazineFlurbiprofenFlufenamic − acidFludroxycortidePrestwick − − acidCetirizineEthisteroneisoprenalineCitioloneClebopridePrestwick − − nitrate n s p14_ EE F A s p8_ EX O S C s p4_ A L G s p13_ G R I PAP M _ I N T S s p8_ M R PS r f A R L6 I P s p8_ m ob1an s p7_ SE L E N O S o r f c _ ND F I P s p12_ T YS ND s p10_ E R G I C s p12_ S L U N _ C S N K A s p7_ R AB C n s p13_ C EP s p8_ H E C T D s p9_ M I B s p14_ppp1 c b sodium channel blockertyrosine kinase inhibitorglucocorticoid receptor agonistadrenergic receptor agonisttubulin polymerization inhibitordopamine receptor antagonistchloride channel blockerglucocorticoid receptor agonistglucocorticoid receptor agonistapoptosis stimulant|NFkB pathway inhibitorhistamine receptor antagonistprogestogen hormoneadrenergic receptor agonistlipotropic|mucolytic agentdopamine receptor antagonistbronchodilatoracetylcholine receptor antagonist node_similarity drug nodecov2 ā ssociated hosthost ptotein drug nodecov2-associated host NspNMOrfSpikeE nsp8_YWHAGorf8_PLEKHF2 nsp10_ERGIC1Spike_GOLGA7 nsp7_SNX18orf9b_CHMP2A nsp7_SLA2N_PABPC1 nsp12_PDZD11nsp8_EXOSC8 E_STUB1
Phenanthridinone nsp8_YWHAB
DL-PPMP
E_PSMB7 nsp7_LCKE_PSMB5 nsp9_ZNF503nsp14_PPP1CA nsp8_MRPS5nsp5_HDAC2 nsp8_SEPSECS
Trichostatin-A nsp8_MRPS25nsp14_SLC25A5 nsp8_ATE1nsp7_SELENOS
Cytochalasin B nsp13_CEP43 nsp12_TYSND1N_MOV10M_SLC30A7M_ANO6 nsp8_mob1ansp8_sumo3nsp8_casp8M_TRAM1L1
Demecolcine
M_PYHIN1nsp4_TIMM29nsp4_ALG11nsp8_SUMO4nsp14_EEF1A1 rug nodeov2 ā ssociated hostost ptotein drug nodecov2-associated host NspNMOrfSpikeE nsp10_ERGIC1nsp7_SNX18nsp7_SLA2nsp12_PDZD11 nsp7_LCKnsp9_ZNF503nsp8_MRPS5nsp8_SEPSECS nsp8_ATE1nsp7_SELENOSnsp13_CEP43 nsp12_TYSND1N_MOV10M_SLC30A7M_ANO6 nsp8_mob1ansp8_sumo3nsp8_casp8M_TRAM1L1 Demecolcine
M_PYHIN19
EAB drug nodecov2 ā ssociated hosthost ptotein drug nodecov2-associated host NspNMOrfSpikeE nsp8_YWHAGorf8_PLEKHF2 nsp10_ERGIC1Spike_GOLGA7 nsp7_SNX18orf9b_CHMP2A nsp7_SLA2N_PABPC1 nsp12_PDZD11nsp8_EXOSC8 E_STUB1
Phenanthridinone nsp8_YWHAB
DL-PPMP
E_PSMB7 nsp7_LCKE_PSMB5 nsp9_ZNF503nsp14_PPP1CA nsp8_MRPS5nsp5_HDAC2 nsp8_SEPSECS
Trichostatin-A nsp8_MRPS25nsp14_SLC25A5 nsp8_ATE1nsp7_SELENOS
Cytochalasin B nsp13_CEP43 nsp12_TYSND1N_MOV10M_SLC30A7M_ANO6 nsp8_mob1ansp8_sumo3nsp8_casp8M_TRAM1L1
Demecolcine
M_PYHIN1nsp4_TIMM29nsp4_ALG11nsp8_SUMO4nsp14_EEF1A1 DE probabilty FlecainideGenisteinHalcinonideFludrocortisone − acetateClenbuterolFenbendazoleFluphenazineFlurbiprofenFlufenamic − acidFludroxycortidePrestwick − − acidCetirizineEthisteroneisoprenalineCitioloneClebopridePrestwick − − nitrate n s p14_ EE F A s p8_ EX O S C s p4_ A L G s p13_ G R I PAP M _ I N T S s p8_ M R PS r f A R L6 I P s p8_ m ob1an s p7_ SE L E N O S o r f c _ ND F I P s p12_ T YS ND s p10_ E R G I C s p12_ S L U N _ C S N K A s p7_ R AB C n s p13_ C EP s p8_ H E C T D s p9_ M I B s p14_ppp1 c b sodium channel blockertyrosine kinase inhibitorglucocorticoid receptor agonistadrenergic receptor agonisttubulin polymerization inhibitordopamine receptor antagonistchloride channel blockerglucocorticoid receptor agonistglucocorticoid receptor agonistapoptosis stimulant|NFkB pathway inhibitorhistamine receptor antagonistprogestogen hormoneadrenergic receptor agonistlipotropic|mucolytic agentdopamine receptor antagonistbronchodilatoracetylcholine receptor antagonist node_similarity CD Figure 3.
Results of applying Node2Vec on the whole network. Panel-A shows 2-dimensional t-SNE plot of all nodes in the network.Panel-B shows the same for only drug and CoV-host nodes. Panel-C represents heatmap of 21 ‘most similar’ (structurally equivalent or havesimilar ‘role’) drugs of CoV-hosts. The drugs are colored based on their clinical phase (red–launched, preclinical–blue, phase2/phase3–greenand phase-1/ phase-2–black ). Panel-D represents visualization of 17 Louvain clusters identified in the low dimensional embedding space.Panel-E shows a network consisting of 6 drugs and their most probable neighbours within the network. These drugs share same Louvaincluster with some CoV-host proteins.
SARS-CoV-2 3CL protease . Some other drugs such as ‘Clenbuterol’ and ‘Fenbendazole’, the probable neighbor of ppp1cband EEF1A respectively, are used as bronchodilators in asthma.To explore the closely connected groups, we have constructed a neighborhood graph using the K-th nearest neighboralgorithm from the node embeddings and apply Louvain clustering (Figure 3-panel-C). Although there is a clear separationbetween host proteins (including CoV-host) cluster and drug cluster, some of the Louvain clusters contain both types of nodes.For example, Louvain cluster-16 and -17 contain four and two drugs along with the other CoV-host proteins, respectively.Figure 3 panel-D represents a network consisting of these six drugs and their most similar CoV-host nodes. Drug–CoV-host Interaction Prediction
For drug–Cov-host interaction prediction, we exploit Variational Graph Autoencoder (VGAE), an unsupervised graph neuralnetwork model, first introduced in to leverage the concept of variational autoencoder in graph-structured data. To makelearning faster, we utilized the fastGAE model to take advantage of the fast decoding phase. We have used two data matricesin the fastGAE model for learning: one is the adjacency matrix, which represents the interaction information over all thenodes, and the other one is the feature matrix representing the low-dimensional embeddings of all the nodes in the network.We create a test set of ‘non-edges’ by removing all existing links between drugs and CoV-host proteins from all possiblecombinations (332 CoV-host × A and feature matrix F . The trained model is then applied to the test ‘non-edges’ to know the most probable links. We identified atotal of 692 most probable links with 92 drugs and 78 CoV-host proteins with a probability threshold of 0.8. The predictedCoV-host proteins are involved in different crucial pathways of viral infection (table 4). The p-values for pathway and GOenrichment are calculated by using the hypergeometric test with 0.05 FDR corrections. Figure 4, Panel-A shows the heatmap ofprobability scores between predicted drugs and CoV-host proteins. To get more details of the predicted bipartite graph, we pigeninBaclofenAnisomycinCamptothecinDaunorubicinDoxorubicinEtamsylateNaftopidilScriptaidNiclosamideMidecamycinMedrysoneLevonorgestrelMeclofenoxateFisetinAlexidineClopamideMetforminPiribedilPyrviniumSirolimusTretinoinAtovaquoneBenperidolBisacodylDigitoxigeninDiltiazemEmetineEstropipateFursultiamineMitoxantroneNadololNocodazoleOuabainParthenolidePuromycinSotalolSulconazoleSulpirideTracazolateNovobiocinAzacyclonolGalantamineVerteporfinAlprostadilFulvestrantCantharidinChlortetracyclineDanazolGeldanamycinMG − n s p8_ SEPSE C S n s p8_ M R PS s p13_ C EP s p12_ T YS ND s p8_ m ob1an s p8_ Y W H A G n s p8_ c a s p8 M _ AKAP N _ PABP C r f V D A C M _ HC F C r f C O L6 A s p8_ A T E S p i k e_ H SPA s p6_ A T R X n s p9_ N EK s p14_ S I R T r f E M C s p7_ R AB A n s p14_ HNRN P H r f M R r f H L A − G n s p7_ S C A R B r f A T P A s p10_ AP A r f PV R n s p13_ P R KA R A o r f c _ T AP T r f c _ S L C A r f T R I M s p8_ M R PS s p13_ P D E D I P n s p13_ T L E s p9_ Z N F N _ M O V M _ A N O s p12_ SB N O N _ G BP s p7_ T N F A I P s p6_ H SPA M _ F N E _ S M UR F r f H M O X r f F KBP r f c _ A C A D r f c _ R E T R E G r f A R L6 I P s p7_ SE L E N O S n s p7_ R AB s p13_ C EP s p12_ PP I L3 M _ S L C A E _ Z C H s p14_ EE F A s p8_ s u m o3n s p8_ S U M O M _ T R A M M _ PY H I N r f csk r f H L A − H o r f l a − a E _ M Y O F n s p7_ T R A F s p14_ S I A H s p8_ EX O S C r f c _ P I GO n s p13_ C EP s p14_ PPP C A n s p4_ T I MM s p4_ A L G S p i k e_ Z DHHC S p i k e_ T M E M r f P O F U T s p8_ EX O S C s p14_ H O M E R r f P C SK s p7_ I T S N B1 B2 B3 B4AB C D E
Figure 4.
Drug–CoV-host predicted interaction: panel-A shows heatmap of probability scores between 92 drugs and 78 CoV-host proteins.The four predicted bipartite modules are annotated as B1, B2, B3 and B4 within the heatmap. The drugs are colored based on their clinicalphase (red–launched, preclinical–blue, phase2/phase3–green and phase-1/ phase-2–black ). Panel-B, C, D and E represents networkscorresponding to B1, B2, B3 and B4 modules.The drugs are annotated using the disease area found in CMAP database B C D E
Figure 5.
Predicted interactions for probability threshold: 0.9. panel-A shows the interaction graph between drugs and CoV-host. Drugs areannotated with their usage. Panel-B, C, D and E represents quasi-bicliques for one, two, three and more than three drugs moleculesrespectively. se a weighted bipartite clustering algorithm proposed by J. Beckett . This results in 4 bipartite modules (Panel-A figure4): B1 (11 drugs, 28 CoV-host), B2 (4 drugs, 41 CoV-host), B3 (71 rugs and 4 CoV-host), and B4 ( 6 drugs and 5 CoV-host).The other panels of the figure show the network diagram of four bipartite modules. B1 contains 11 drugs, including someantibiotics (Anisomycin, Midecamycin), and anti-cancer drugs (Doxorubicin, Camptothecin). B3 also has some antibiotics suchas Puromycin, Demeclocycline, Dirithromycin, Geldanamycin, and Chlortetracycline, among them, the first three are widelyused for bronchitis, pneumonia, and respiratory tract infections . Some other drugs such as Lobeline and Ambroxol included inthe B3 module have a variety of therapeutic uses, including respiratory disorders and bronchitis. The high confidence predictedinteractions (with threshold 0.9) is shown in Figure 5 panel-A. To highlight some repurposable drug combination and theirpredicted CoV-host target, we perform a weighted clustering (clusterONE) on this network and found some quasy-bicluques(shown in Panel-B-E)We matched our predicted drugs with the drug list recently published by Zhou et al. and found six common drugs:Mesalazine, Vinblastine, Menadione, Medrysone, Fulvestrant, and Apigenin. Among them, Apigenin has a known effect in theantiviral activity together with quercetin, rutin, and other flavonoids . Mesalazine is also proven to be extremely effective inthe treatment of other viral diseases like influenza A/H5N1 virus. . Repurposable drugs for SARS-CoV-2
Baclofen and Fisetin
Baclofen, a benzodiazepine receptor (GABAA-receptor) agonist, has a potential role in antiviral associated treatment . Anti-inflammatory antecedents fisetin is also tested for antiviral activity, such as for inhibition of Dengue (DENV) virus infection .It down-regulates the production of proinflammatory cytokines induced by a DENV infection. Both of the drugs are listed inthe high confidence interaction set with the three CoV-hosts: TAPT1 (interacted with SARS-CoV-2 protein: orf9c), SLC30A6(interacted with SARS-CoV-2 protein: orf9c), and TRIM59 (interacted with SARS-CoV-2 protein: orf3a) (Figure 5-panel-C). Topoisomerase Inhibitors
Topoisomerase Inhibitors play an active role as antiviral agents by inhibiting the viral DNA replication . Some Topoiso-merase Inhibitors such as Camptothecin, Daunorubicin, Doxorubicin, Irinotecan and Mitoxantrone are predicted to interact withseveral CoV-host proteins. It has been demonstrated that the anticancer drug camptothecin (CPT) and its derivative Irinotecanhave a potential role in antiviral activity . It inhibits host cell enzyme topoisomerase-I which is required for the initiationas well as completion of viral functions in host cell . Daunorubicin (DNR) has also been demonstrated as an inhibitor ofHIV-1 virus replication in human host cells . The conventional anticancer antibiotic Doxorubicin was identified as a selectiveinhibitor of in vitro Dengue and Yellow Fever virus replication . It is also reported that doxorubicin coupling with monoclonalantibody can create an immunoconjugate that can eliminate HIV-1 infection in mice cell . Mitoxantrone shows antiviralactivity against the human herpes simplex virus (HSV1) by reducing the transcription of viral genes in many human cells thatare essential for DNA synthesis . Histone Deacetylases Inhibitors (HDACi)
Histone Deacetylases Inhibitors (HDACi) are generally used as latency-reversing agents for purging HIV-1 from the latentreservoir like CD4 memory cell . Our predicted drug list (Table 3) contains two HDACi: Scriptaid and Vorinostat. Vorinostratecan be used to achieve latency reversal in the HIV-1 virus safely and repeatedly . Asymptomatic patients infected withSARS-CoV-2 are of significant concern as they are more vulnerable to infect large number of people than symptomatic patients.Moreover, in most cases (99 percentile), patients develop symptoms after an average of 5-14 days, which is longer than theincubation period of SARS, MERS, or other viruses . To this end, HDACi may serve as good candidates for recognizing andclearing the cells in which SARS-CoV-2 latency has been reversed. HSP inhibitor
Heat shock protein 90 (HSP) is described as a crucial host factor in the life cycle of several viruses that includes an entry in thecell, nuclear import, transcription, and replication . HSP90 is also shown to be an essential factor for SARS-CoV-2 envelop(E) protein . In , HSP90 is described as a promising target for antiviral drugs. The list of predicted drugs contains three HSPinhibitors: Tanespimycin, Geldanamycin, and its derivative Alvespimycin. The first two have a substantial effect in inhibitingthe replication of Herpes Simplex Virus and Human enterovirus 71 (EV71), respectively. Recently in , Geldanamycin and itsderivatives are proposed to be an effective drug in the treatment of COVID-19. Antimalarial agent|DNA inhibitor, DNA methyltransferase inhibitor, DNA synthesis inhibitor
Inhibiting DNA synthesis during viral replication is one of the critical steps in disrupting the viral infection. The list ofpredicted drugs contains six such small molecules/drugs, viz., Niclosamide, Azacitidine, Anisomycin, Novobiocin, Primaquine,Menadione, and Metronidazole. DNA synthesis inhibitor Niclosamide has a great potential to treat a variety of viral infections,including SARS-CoV, MERS-CoV, and HCV virus and has recently been described as a potential candidate to fight the able 2. Dataset Description
Index Dataset Category Dataset
332 27 (
261 6 ( , ChEMBL , Therapeu-tic Target Database (TTD) , PharmGKBdatabase 1788407 1307 ( SARS-CoV-2 virus . Novobiocin, an aminocoumarin antibiotic, is also used in the treatment of Zika virus (ZIKV) infectionsdue to its protease inhibitory activity. In 2005, Chloroquine (CQ) had been demonstrated as an effective drug against thespread of severe acute respiratory syndrome (SARS) coronavirus (SARS-CoV). Recently Hydroxychloroquine (HCQ) sulfate, aderivative of CQ, has been evaluated to efficiently inhibit SARS-CoV-2 infection in vitro . Therefore, another anti-malarialaminoquinolin drug Primaquine may also contribute to the attenuation of the inflammatory response of COVID-19 patients.Primaquine is also established to be effective in the treatment of Pneumocystis pneumonia (PCP) . Cardiac Glycosides ATPase Inhibitor
Cardiac glycosides have been shown to play a crucial role in antiviral drugs. These drugs target cell host proteins, which helpreduce the resistance to antiviral treatments. The antiviral effects of cardiac glycosides have been described by inhibiting thepump function of Na, K-ATPase. This makes them essential drugs against human viral infections. The predicted list of drugscontains three cardiac glycosides ATPase inhibitors: Digoxin, Digitoxigenin, and Ouabain. These drugs have been reported tobe effective against different viruses such as herpes simplex, influenza, chikungunya, coronavirus, and respiratory syncytialvirus . MG132, Resveratrol and Captopril
MG132, proteasomal inhibitor is established to be a strong inhibitor of SARS-CoV replication in early steps of the viral lifecycle . MG132 inhibits the cysteine protease m-calpain, which results in a pronounced inhibition of SARS-CoV-2 replicationin the host cell. In , Resveratrol has been demonstrated to be a significant inhibitor MERS-CoV infection. Resveratroltreatment decreases the expression of nucleocapsid (N) protein of MERS-CoV, which is essential for viral replication. AsMG132 and Resveratrol play a vital role in inhibiting the replication of other coronaviruses SARS-CoV and MERS-CoV,so they may be potential candidates for the prevention and treatment of SARS-CoV-2. Another drug Captopril is known asAngiotensin II receptor blockers (ARB), which directly inhibits the production of angiotensin II. In , Angiotensin-convertingenzyme 2 (ACE2) is demonstrated as the binding site for SARS-CoV-2. So Angiotensin II receptor blockers (ARB) may begood candidates to use in the tentative treatment for SARS-CoV-2 infections .In summary, our proposed method predicts several drug targets and multiple repurposable drugs that have prominentliterature evidence of uses as antiviral drugs, especially for two other coronavirus species SARS-CoV and MERS-CoV. Somedrugs are also directly associated with the treatment of SARS-CoV-2 identified by recent literature. However, further clinicaltrials and several preclinical experiments are required to validate the clinical benefits of these potential drugs and drug targets. Table 3.
Table shows 92 predicted repurposable drugs
Sl.No. pubchemid Drug Clinical phase Uses1 15281 Alexidine Preclinical phosphatidylglycerophosphatase inhibitor2 17150 Apigenin Preclinical casein kinase inhibitor|cell proliferation inhibitor3 18998 Baclofen Launched benzodiazepine receptor agonist4 33216 Clopamide Launched sodium/chloride cotransporter inhibitor5 37446 Digoxin Launched ATPase inhibitor6 107704 Digitoxigenin Preclinical ATPase inhibitor7 117069 Ouabain Launched ATPase inhibitor8 59105 Metformin Launched insulin sensitizer9 71063 Piribedil Launched dopamine receptor agonist10 220964 Pergolide Withdrawn dopamine receptor agonist11 76299 Pyrvinium Launched androgen receptor antagonist able 3 continued from previous page
12 80626 Sirolimus Launched mTOR inhibitor13 86324 Tanespimycin Phase 3 HSP inhibitor14 180858 Geldanamycin Preclinical HSP inhibitor15 245370 Alvespimycin Phase 2 HSP inhibitor16 93086 Tretinoin Launched retinoid receptor agonist|retinoid receptor ligand17 102361 Anisomycin Preclinical DNA synthesis inhibitor18 103692 Atovaquone Launched mitochondrial electron transport inhibitor19 104090 Azacitidine Launched DNA methyltransferase inhibitor20 104487 Benperidol Launched dopamine receptor antagonist21 121586 Sulpiride Launched dopamine receptor antagonist22 104924 Bisacodyl Launched laxative23 105316 Camptothecin Phase 3 topoisomerase inhibitor24 108903 Doxorubicin Launched topoisomerase inhibitor25 114021 Irinotecan Launched topoisomerase inhibitor26 115466 Mitoxantrone Launched topoisomerase inhibitor27 107326 Daunorubicin Launched RNA synthesis inhibitor|topoisomerase inhibitor28 108095 Digoxigenin Preclinical steroid29 108477 Diltiazem Launched calcium channel blocker30 109280 Emetine Phase 2 protein synthesis inhibitor31 118729 Puromycin Preclinical protein synthesis inhibitor32 178082 Chlortetracycline Launched protein synthesis inhibitor33 182249 Midecamycin Launched protein synthesis inhibitor34 199663 Clindamycin Launched protein synthesis inhibitor35 109651 Estropipate Launched estrogen receptor agonist36 110088 Etamsylate Launched haemostatic agent37 111183 Fursultiamine Launched vitamin B38 114689 Medrysone Launched glucocorticoid receptor agonist39 115901 Nadolol Launched adrenergic receptor antagonist40 116238 Naftopidil Launched adrenergic receptor antagonist41 119913 Sotalol Launched adrenergic receptor antagonist42 215415 Metoprolol Launched adrenergic receptor antagonist43 520283 Doxazosin Launched adrenergic receptor antagonist44 116649 Nocodazole Preclinical tubulin polymerization inhibitor45 220050 Paclitaxel Launched tubulin polymerization inhibitor46 117480 Parthenolide Phase 1 NFkB pathway inhibitor47 119578 Scriptaid Preclinical HDAC inhibitor48 124043 Vorinostat Launched HDAC inhibitor49 120719 Sulconazole Launched sterol demethylase inhibitor50 123302 Tracazolate Phase 2 GABA receptor modulator51 125857 Novobiocin Launched bacterial DNA gyrase inhibitor52 129195 Azacyclonol Preclinical histamine receptor antagonist53 133584 Galantamine Launched acetylcholinesterase inhibitor54 139192 Niclosamide Launched DNA replication inhibitor|STAT inhibitor55 145407 Verteporfin Launched photosensitizing agent56 150553 Alprostadil Launched prostanoid receptor agonist57 159244 Fulvestrant Launched estrogen receptor antagonist58 176723 Cantharidin Launched protein phosphatase inhibitor59 179449 Danazol Launched estrogen receptor antagonist|progesterone receptor agonist60 186039 MG-132 Preclinical proteasome inhibitor61 191423 Ajmaline Launched sodium channel blocker62 245766 Ambroxol Launched sodium channel blocker63 352301 Cinchocaine Launched sodium channel blocker able 3 continued from previous page
64 193762 Ampyrone Preclinical cyclooxygenase inhibitor65 203449 Dirithromycin Launched bacterial 50S ribosomal subunit inhibitor66 212182 Levonorgestrel Launched estrogen receptor agonist|glucocorticoid receptor antago-nist|progesterone receptor agonist|progesterone receptor an-tagonist67 214498 Mesalazine Launched cyclooxygenase inhibitor|lipoxygenase inhibitor|prostanoid re-ceptor antagonist68 217727 Naringin Preclinical cytochrome P450 inhibitor69 223747 Primaquine Launched antimalarial agent|DNA inhibitor70 255979 Captopril Launched angiotensin converting enzyme inhibitor71 260887 Chlorzoxazone Launched bacterial 30S ribosomal subunit inhibitor72 268568 Demeclocycline Launched bacterial 30S ribosomal subunit inhibitor73 270479 Dihydroergotamine Launched serotonin receptor agonist74 279244 Fenoprofen Launched prostaglandin inhibitor75 291604 Lobeline Launched acetylcholine receptor antagonist76 293538 Luteolin Phase 2 glucosidase inhibitor77 295381 Meclofenoxate Launched nootropic agent78 295680 Menadione Launched mitochondrial DNA polymerase inhibitor|phosphatase in-hibitor79 321706 Sulfaphenazole Launched dihydropteroate synthetase inhibitor80 333068 Zardaverine Phase 2 phosphodiesterase inhibitor81 372869 Metronidazole Launched DNA inhibitor82 416788 Resveratrol Launched cytochrome P450 inhibitor|SIRT activator83 423536 Fisetin Preclinical Aurora kinase inhibitor84 441851 Morantel Launched acetylcholine receptor agonist85 451754 Kinetin Launched cell division inducer86 461005 Chrysin Phase 1 breast cancer resistance protein inhibitor87 471412 Vinblastine Launched microtubule inhibitor|tubulin polymerization inhibitor88 472841 Meticrane Launched diuretic89 483385 Ethosuximide Launched succinimide antiepileptic90 494980 Tetraethylenepentamine Phase 2/Phase3 superoxide dismutase inhibitor91 517908 Ticlopidine Launched purinergic receptor antagonist92 543633 Todralazine Launched antihypertensive agent
Discussion
In this work, we have successfully generated a list of high-confidence candidate drugs that can be repurposed to counteractSARS-CoV-2 infections. The novelties have been to integrate most recently published SARS-CoV-2 protein interaction data onthe one hand, and to use most recent, most advanced AI (deep learning) based high-performance prediction machinery on theother hand, as the two major points. In experiments, we have validated that our prediction pipeline operates at utmost accuracy,confirming the quality of the predictions we have raised.The recent publication (April 30, 2020) of two novel SARS-CoV-2-human protein interaction resources has unlockedenormous possibilities in studying virulence and pathogenicity of SARS-CoV-2, and the driving mechanisms behind it. Onlynow, various experimental and computational approaches in the design of drugs against COVID-19 have become conceivable,and only now such approaches can be exploited truly systematically, at both sufficiently high throughput and accuracy.Here, to the best of our knowledge, we have done this for the first time. We have integrated the new SARS-CoV-2 proteininteraction data with well established, long-term curated human protein and drug interaction data. These data capture hundredsof thousands approved interfaces between encompassing sets of molecules, either reflecting drugs or human proteins. As aresult, we have obtained a comprehensive drug-human-virus interaction network that reflects the latest state of the art in termsof our knowledge about how SARS-CoV-2 and interacts with human proteins and repurposable drugs. able 4. table describing the Gene Ontology (Biological process) and KEGG pathway for 78 CoV-host proteins
Term (GO/KEGG) p-value GenesHerpes simplex infection 0.002142 TRAF2, PPP1CA, CASP8, HLA-A,HCFC1, HLA-GViral carcinogenesis 0.003507 TRAF2, YWHAG, CASP8, HLA-A,VDAC3, HLA-GEndocytosis 0.006957 AP2A2, HLA-A, HSPA6, SMURF1,RAB10, HLA-GEpstein-Barr virus infection 0.023579 TRAF2, HLA-A, TNFAIP3, HLA-GLegionellosis 0.030506 EEF1A1, CASP8, HSPA6Viral myocarditis 0.033704 CASP8, HLA-A, HLA-Gantigen processing and presentation (GO:0019882) 8.04E-05 HLA-H, HLA-A, MR1, RAB10,HLA-Gantigen processing and presentation of peptide antigen via MHCclass I (GO:0002474) 2.60E-04 HLA-H, HLA-A, MR1, HLA-Gnegative regulation of extrinsic apoptotic signaling pathway viadeath domain receptors (GO:1902042) 3.46E-04 TRAF2, HMOX1, CASP8, TN-FAIP3antigen processing and presentation of exogenous peptide anti-gen via MHC class I, TAP-independent (GO:0002480) 6.05E-04 HLA-H, HLA-A, HLA-Gnegative regulation of I-kappaB kinase/NF-kappaB signaling(GO:0043124) 6.14E-04 TRIM59, CASP8, TLE1, TNFAIP3cellular response to heat (GO:0034605) 0.010384 HMOX1, HSPA6, MYOFnuclear polyadenylation-dependent tRNA catabolic process(GO:0071038) 0.020673 EXOSC8, EXOSC2nuclear polyadenylation-dependent rRNA catabolic process(GO:0071035) 0.024756 EXOSC8, EXOSC2antigen processing and presentation of exogenous peptide anti-gen via MHC class I, TAP-dependent (GO:0002479) 0.028411 HLA-H, HLA-A, HLA-Gtype I interferon signaling pathway (GO:0060337) 0.02925 HLA-H, HLA-A, HLA-Gdeath-inducing signaling complex assembly (GO:0071550) 0.032873 TRAF2, CASP8low-density lipoprotein particle clearance (GO:0034383) 0.032873 HMOX1, SCARB1exonucleolytic trimming to generate mature 3’-end of 5.8S rRNAfrom tricistronic rRNA transcript (GO:0000467) 0.032873 EXOSC8, EXOSC2U4 snRNA 3’-end processing (GO:0034475) 0.032873 EXOSC8, EXOSC2interferon-gamma-mediated signaling pathway (GO:0060333) 0.035392 HLA-H, HLA-A, HLA-Gprotein processing (GO:0016485) 0.036307 PCSK1, TYSND1, PCSK6nuclear-transcribed mRNA catabolic process, exonucleolytic,3’-5’(GO:0034427) 0.036907 EXOSC8, EXOSC2regulation of sequestering of zinc ion (GO:0061088) 0.036907 SLC30A6, SLC30A7regulation of immune response (GO:0050776) 0.038229 PVR, HLA-H, HLA-A, HLA-G or exploiting the new network—already establishing a new resource in its own right—we have opted for most recent andadvanced deep learning based technology. A generic reason for this choice is the surge in advances and the resulting boost inoperative prediction performance of related methods over the last 3-4 years. A particular reason is to make use of most advancedgraph neural network based techniques, namely variational graph autoencoders as a deep generative model of utmost accuracy,the practical implementation of which was presented only a few months ago (just like the relevant network data). Note thatonly this recent implementation enables to process networks of sizes in the range of common molecular interaction data. Inessence, graph neural networks “learn” the structure of links in networks, and infer rules that underlie the interplay of links.Based on the knowledge gained, they enable to predict links and output the corresponding links together with probabilities forthem to indeed be missing.Simulation experiments, reflecting scenarios where links known to exist in our network were re-established by predictionupon their removal, pointed out that our pipeline does indeed predict missing links at utmost accuracy.Encouraged by these simulations, we proceeded by performing the core experiments, and predicted links to be missingwithout prior removal of links in our encompassing network. These core experiments revealed 692 high confidence interactionsrelating to 92 drugs. In our experiments, we focused on predicting links between drugs and human proteins that in turn areknown to interact with SARS-CoV-2 proteins (SARS-CoV-2 associated host proteins). We have decidedly put the focus not ondrug - SARS-CoV-2-protein interactions, which would have reflected more direct therapy strategies against the virus. Instead,we have focused on predicting drugs that serve the purposes of host-directed therapy (HDT) options, because HDT strategieshave proven to be more sustainable with respect to mutations by which the virus escapes a response to the therapy applied.Note that HDT strategies particularly cater to drug repurposing attempts, because repurposed drugs have already proven to lacksevere side effects, because they are either already in use, or have successfully passed the preclinical trial stages.We further systematically categorized the 92 repurposable drugs into 70 categories based on their domains of applicationand molecular mechanism. According to this, we identified and highlighted several drugs that target host proteins that thevirus needs to enter (and subsequently hijack) human cells. One such example is Captopril, which directly inhibits theproduction of Angiotensin-Converting Enzyme-2 (ACE-2), in turn already known to be a crucial host factor for SARS-CoV-2.Further, we identified Primaquine, as an antimalaria drug used to prevent the Malaria and also Pneumocystis pneumonia (PCP)relapses, because it interacts with the TIM complex TIMM29 and ALG11. Moreover, we have highlighted drugs that actas DNA replication inhibitor (Niclosamide, Anisomycin), glucocorticoid receptor agonists (Medrysone), ATPase inhibitors(Digitoxigenin, Digoxin), topoisomerase inhibitors (Camptothecin, Irinotecan), and proteosomal inhibitors (MG-132). Note thatsome drugs are known to have rather severe side effects from their original use (Doxorubicin, Vinblastine), but the disruptingeffects of their short-term usage in severe COVID-19 infections may mean sufficient compensation.In summary, we have compiled a list of drugs, which when repurposed are of great potential in the fight against theCOVID-19 pandemic, where therapy options are urgently needed. Our list of predicted drugs suggests both options that hadbeen identified and thoroughly discussed before and new opportunities that had not been pointed out earlier. The latter class ofdrugs may offer valuable chances for pursuing new therapy strategies against COVID-19. Materials and Methods
Dataset Preparation
We have utilized three categories of interaction datasets: human protein-protein interactome data, SARS-CoV-2-host proteininteraction data, and drug-host interaction data.
SARS-CoV-2-host Interaction Data
We have taken SARS-CoV-2–host interaction information from two recent studies by Gordon et al and Dick et al . In ,332 high confidence interactions between SARS-CoV-2 and human proteins are predicted using using affinity-purification massspectrometry (AP-MS). In , 261 high confidence interactions are identified using sequence-based PPI predictors (PIPE4 &SPRINT). Drug-Host Interactome Data
The drug–target interaction information has been collected from five databases, viz., DrugBank database (v4.3) , ChEMBL database, Therapeutic Target Database (TTD) , PharmGKB database, and IUPHAR/BPS Guide to PHARMACOLOGY .Total number of drugs and drug-host interactions used in this study are 1309 and 1788407, respectively. The Human Protein–Protein Interactome
We have built a comprehensive list of human PPIs from two datasets: (1) CCSB human Interactome database consisting of7,000 genes, and 13944 high-quality binary interactions , (2) The Human Protein Reference Database which consists of8920 proteins and 53184 PPIs. he summary of all the datasets is provided in Table 2. CMAP database is used to annotate the drugs with their usagedifferent disease areas. Sampling Strategy and Feature Matrix Generation
We have utilized
Node2vec , an algorithmic framework for learning continuous feature representations for nodes in networks.It maps the nodes to a low-dimensional feature space that maximizes the likelihood of preserving network neighborhoods.The principle of feature learning framework in a graph can be described as follows: Let G = ( V , E ) be a given graph, where V represents a set of nodes, and E represents the set of edges. The feature representation of nodes ( | V | ) is given by a mappingfunction: f : V → R d , where d specify the feature dimension. The f may also be represented as a node feature matrix ofdimension of | V | × d . For each node, v ∈ V , NN S ( v ) ⊂ V defines a network neighborhood of node v which is generated using aneighbourhood sampling strategy S . The sampling strategy can be described as an interpolation between breadth-first searchand depth-first search technique . The objective function can be described as:max f (cid:32) ∑ v ∈ V log P ( NN S ( v ) | f ( v )) (cid:33) , (1)This maximizes the likelihood of observing a network neighborhood NN S ( v ) for a node v given on its feature representation f .Now the probability of observing a neighborhood node n i ∈ NN S ( v ) given the feature representation of the source node v isgiven as : P ( NN S ( v ) | f ( v )) = ∏ n i ∈ NN S ( v ) P ( n i | f ( v )) . (2)where, n i is the i th neighbor of node v in neighborhood set NN S ( v ) . The conditional likelihood of each source ( v ) andneighborhood node ( n i ∈ NN S ( V ) ) pair is represented as softmax of dot product of their features f ( v ) and f ( n i ) as follows: P ( n i | f ( v )) = exp ( f ( v ) . f ( n i )) ∑ u ∈ V exp ( f ( u ) . f ( v )) (3) Variational Graph Auto Encoder
Variational Graph Autoencoder (VGAE) is a framework for unsupervised learning on graph-structured data . This model useslatent variables and is effective in learning interpretable latent representations for undirected graphs. The graph autoencoderconsists of two stacked models: 1) Encoder and 2) Decoder. First, an encoder based on graph convolution networks (GCN) maps the nodes into a low-dimensional embedding space. Subsequently, a decoder attempts to reconstruct the original graphstructure from the encoder representations. Both models are jointly trained to optimize the quality of the reconstruction fromthe embedding space, in an unsupervised way. The functions of these two model can be described as follows: Encoder:
It uses Graph Convolution Network (GCN) on adjacency matrix A and the feature representation matrix F . Encodergenerates a d (cid:48) -dimensional latent variable z i for each node i ∈ V , with | V | = n , that corresponds to each embedding node, with d (cid:48) ≤ n . The inference model of the encoder is given below: r ( Z | A , F ) = | v | ∏ i = r ( z i | A , F ) , (4)where, r ( z i | A , F ) corresponds to normal distribution, N ( z i µ i , σ i ) , µ i and σ i are the Gaussian mean and variance parameters.The actual embedding vectors z i are samples drawn from these distributions. Decoder:
It is a generative model that decodes the latent variables z i to reconstruct the matrix A using inner products withsigmoid activation from embedding vector, ( Z ). (cid:98) A i , j = p ( A i , j = | z i , z j ) = Sigmoid ( z Ti . z j ) , (5)where, (cid:98) A is the decoded adjacency matrix. The objective function of the variational graph autoencoder (VGAE) can be writtenas: C VGAE = E r ( Z | A , F ) [ log p ( A | Z )] − D KL ( r ( Z | A , F ) || p ( Z )) (6)The objective function C VGAE maximizes the likelihood of decoding the adjacency matrix w.r.t graph autoencoder weightsusing stochastic gradient decent. Here, D KL ( . || . ) represents Kullback-Leibler divergence and p ( Z ) is the prior distribution oflatent variable. rug–SARS-CoV-2 Link Prediction Adjacency Matrix Preparation
In this work, we consider an undirected graph G = ( V , E ) with | V | = n nodes and | E | = m edges. We denote A as the binary adjacency matrix of G . Here V consists of SARS-Cov-2 proteins, CoV-hostproteins, drug-target proteins and drugs. The matrix ( A ) contains a total of n = n = | N Nc | + | N DT | + | N NT | + | N D | , (7)where, N Nc is the number of SARS-CoV-2 proteins. N DT is the number of drug targets, whereas N NT and N D representthe number of CoV-host and drugs nodes, respectively. Total number of edges is given by: m = | E | + | E | + | E | , (8)where, E represents interactions between SARS-CoV-2 and human host proteins, E is the number of interactions amonghuman proteins, and E represents the number of interactions between drugs and human host proteins.2. Feature Matrix Preparation:
The neighborhood sampling strategy is used here to prepare a feature representation ofall nodes. A flexible biased random walk procedure is employed to explore the neighborhood of each node. A randomwalk in a graph G can be described as the probability: P ( a i = x | a i − = v ) = π ( v , x ) , (9)where, π ( v , x ) is the transition probability between nodes v and x , where ( v , x ) ∈ E and a i is the i th node in the walk oflength l . The transition probability is given by π ( v , x ) = c pq ( t , x ) ∗ w vx , where t is the previous node of v n the walk, w vx is the static edge weights and p, q are the two parameters which guides the walk. The coefficient c pq ( t , x ) is given by c pq ( t , x ) = / p distance(t,x) =
01 distance(t,x) = / q distance(t,x) = distance ( t , x ) represents the shortest path distance between nodes t and node x . The process of feature matrix F n × d generation is governed by the Node2vec algorithm. It starts from every nodes and simulates r random walks of fixedlength l . In every step of walk transition probability π ( v , x ) govern the sampling. The generated walk of each iterationis included to a walk-list. Finally, the stochastic gradient descent is applied to optimize the list of walks and result isreturned.3. Link Prediction:
Scalable and Fast variational graph autoencoder (FastVGAE) is utilized in our proposed work toreduce the computational time of VGAE in large network. The adjacency matrix A and the feature matrix F are giveninto the encoder of FastVGAE. The encoder uses graph convolution neural network (GCN) on the entire graph to createthe latent representation ( Z ) . Z = GCN ( A , F ) (11)The encoder works on full Adjacency Matrix A . After encoding, sampling is done and decoder works on the sampled subgraph.The mechanism of decoder of FastVGAE is slightly different from traditional VGAE. It regenerate the adjacency matrix (cid:98) A based on a subsample of graph nodes, V s . It uses a graph node sampling technique to randomly sample the reconstructednodes at each iteration. Each node is assigned with a probability p i and the selection of noes is based on the high score of p i . The probability p i is given by the following equation: p ( i ) = f ( i ) α ∑ j ∈ V f ( j ) α , (12)where, f ( i ) is the degree of node i , and α is the sharpening parameter. We take α = | V s | = n s , where n s is the number of sampling nodes.The decoder reconstructs the smaller matrix, (cid:98) A s of dimension n s × n s instead of decoding the main adjacency matrix A .The decoder function follows the following equation: (cid:98) A s ( i , j ) = Sigmoid ( z Ti . z j ) , ∀ ( i , j ) ∈ V s × V s . (13) t each training iteration different subgraph ( G s ) is drawn using the sampling method.After the model is trained the drug–CoV-host links are predicted using the following equation: p ( A i j = | z i , z j ) = Sigmoid ( z Ti , z j ) , (14)where A i j represents the possible links between all combination of SARS-CoV-2 nodes and drug nodes. For eachcombination of nodes the model gives probability based on the logistic sigmoid function. References Wu, F. et al.
A new coronavirus associated with human respiratory disease in china.
Nature , 265–269 (2020). Forst, C. V. Host–pathogen systems biology. In
Infectious Disease Informatics , 123–147 (Springer, 2010). Kaufmann, S. H., Dorhoi, A., Hotchkiss, R. S. & Bartenschlager, R. Host-directed therapies for bacterial and viralinfections.
Nat. Rev. Drug Discov. , 35 (2018). Alaimo, S. & Pulvirenti, A. Network-based drug repositioning: Approaches, resources, and research directions. In
Computational Methods for Drug Repurposing , 97–113 (Springer, 2019). de Chassey, B., Meyniel-Schicklin, L., Aublin-Gex, A., André, P. & Lotteau, V. New horizons for antiviral drug discoveryfrom virus–host protein interaction networks. Curr. opinion virology , 606–613 (2012). Emig, D. et al.
Drug target prediction and repositioning using an integrated network-based approach.
PLoS One (2013). Doolittle, J. M. & Gomez, S. M. Mapping protein interactions between dengue virus and its human and insect hosts.
PLoSneglected tropical diseases (2011). Bandyopadhyay, S., Ray, S., Mukhopadhyay, A. & Maulik, U. A review of in silico approaches for analysis and predictionof hiv-1-human protein–protein interactions.
Brief. Bioinforma. , 830–851 (2015). Mukhopadhyay, A. & Maulik, U. Network-based study reveals potential infection pathways of hepatitis-c leading tovarious diseases.
PloS one (2014). Cao, H. et al.
Prediction of the ebola virus infection related human genes using protein-protein interaction network.
Comb.chemistry & high throughput screening , 638–646 (2017). Cheng, F. et al.
A genome-wide positioning systems network algorithm for in silico drug repurposing.
Nat. communications , 1–14 (2019). Zeng, X. et al. deepdr: a network-based deep learning approach to in silico drug repositioning.
Bioinformatics ,5191–5198 (2019). Zhou, Y. et al.
Network-based drug repurposing for novel coronavirus 2019-ncov/sars-cov-2.
Cell discovery , 1–18(2020). Li, X. et al.
Network bioinformatics analysis provides insight into drug repurposing for covid-2019. (2020).
Gordon, D. E. et al.
A sars-cov-2 protein interaction map reveals targets for drug repurposing.
Nature
Dick, K., Biggar, K. K. G. & R., J. Comprehensive prediction of the sars-cov-2 vs. human interactome using pipe4, sprint,and pipe-sites, DOI: 10.5683/SP2/JZ77XA (2020).
Grover, A. & Leskovec, J. node2vec: Scalable feature learning for networks. In
Proceedings of the 22nd ACM SIGKDDinternational conference on Knowledge discovery and data mining , 855–864 (2016).
Kipf, T. N. & Welling, M. Variational graph auto-encoders. arXiv preprint arXiv:1611.07308 (2016).
Salha, G., Hennequin, R., Remy, J.-B., Moussallam, M. & Vazirgiannis, M. Fastgae: Fast, scalable and effective graphautoencoders with stochastic subgraph decoding. arXiv preprint (2020).
Rossi, R. A. et al.
From community to role-based graph embeddings. arXiv preprint arXiv:1908.08572 (2019).
Wen, C.-C. et al.
Specific plant terpenoids and lignoids possess potent antiviral activities against severe acute respiratorysyndrome coronavirus.
J. medicinal chemistry , 4087–4095 (2007). Subramanian, A. et al.
A next generation connectivity map: L1000 platform and the first 1,000,000 profiles.
Cell ,1437–1452 (2017).
Beckett, S. J. Improved community detection in weighted bipartite networks.
Royal Soc. open science , 140536 (2016). Wishart, D. S. et al.
Drugbank: a comprehensive resource for in silico drug discovery and exploration.
Nucleic acidsresearch , D668–D672 (2006). Nepusz, T., Yu, H. & Paccanaro, A. Detecting overlapping protein complexes in protein-protein interaction networks.
Nat.methods , 471 (2012). Salehi, B. et al.
The therapeutic potential of apigenin.
Int. journal molecular sciences , 1305 (2019). Zheng, B.-J. et al.
Delayed antiviral plus immunomodulator treatment still reduces mortality in mice infected by highinoculum of influenza a/h5n1 virus.
Proc. Natl. Acad. Sci. , 8091–8096 (2008).
Leggio, L. et al.
Baclofen promotes alcohol abstinence in alcohol dependent cirrhotic patients with hepatitis c virus (hcv)infection.
Addict. behaviors , 561–564 (2012). Jasso-Miranda, C. et al.
Antiviral and immunomodulatory effects of polyphenols on macrophages infected with denguevirus serotypes 2 and 3 enhanced or not with antibodies.
Infect. drug resistance , 1833 (2019). Maschera, B., Ferrazzi, E., Rassu, M., Toni, M. & Palu, G. Evaluation of topoisomerase inhibitors as potential antiviralagents.
Antivir. Chem. Chemother. , 85–91 (1993). Gonzlez-Molleda, Y. Y., Lorenzo Wang & Yan. Potent antiviral activity of topoisomerase i and ii inhibitors against kaposi’ssarcoma-associated herpesvirus.
Antimicrob. agents chemotherapy , 893–902 (2012). Horwitz, S. B., Chang, C.-K. & Grollman, A. P. Antiviral action of camptothecin.
Antimicrob. agents chemotherapy ,395–401 (1972). Bennett, R. P. et al.
An analog of camptothecin inactive against topoisomerase i is broadly neutralizing of hiv-1 throughinhibition of vif-dependent apobec3g degradation.
Antivir. research , 51–59 (2016).
Pantazis, P., Han, Z., Chatterjee, D. & Wyche, J. Water-insoluble camptothecin analogues as potential antiviral drugs.
J.biomedical science , 1–7 (1999). Filion, L., Logan, D., Gaudreault, R. & Izaguirre, C. Inhibition of hiv-1 replication by daunorubicin.
Clin. investigativemedicine. Med. clinique et experimentale , 339–347 (1993). Kaptein, S. J. et al.
A derivate of the antibiotic doxorubicin is a selective inhibitor of dengue and yellow fever virusreplication in vitro.
Antimicrob. agents chemotherapy , 5269–5280 (2010). Johansson, S., Goldenberg, D. M., Griffiths, G. L., Wahren, B. & Hinkula, J. Elimination of hiv-1 infection by treatmentwith a doxorubicin-conjugated anti-envelope antibody.
Aids , 1911–1915 (2006). Huang, Q. et al.
Antiviral activity of mitoxantrone dihydrochloride against human herpes simplex virus mediated bysuppression of the viral immediate early genes.
BMC microbiology , 274 (2019). Matalon, S., Rasmussen, T. A. & Dinarello, C. A. Histone deacetylase inhibitors for purging hiv-1 from the latent reservoir.
Mol. medicine , 466–472 (2011). Archin, N. M. et al.
Interval dosing with the hdac inhibitor vorinostat effectively reverses hiv latency.
The J. clinicalinvestigation , 3126–3135 (2017).
Lauer, S. A. et al.
The incubation period of coronavirus disease 2019 (covid-19) from publicly reported confirmed cases:estimation and application.
Annals internal medicine (2020).
Ju, H.-Q. et al.
Synthesis and in vitro anti-hsv-1 activity of a novel hsp90 inhibitor bj-b11.
Bioorganic medicinal chemistryletters , 1675–1677 (2011). Shim, H. Y., Quan, X., Yi, Y.-S. & Jung, G. Heat shock protein 90 facilitates formation of the hbv capsid via interactingwith the hbv core protein dimers.
Virology , 161–169 (2011).
DeDiego, M. L. et al.
Severe acute respiratory syndrome coronavirus envelope protein regulates cell stress response andapoptosis.
PLoS pathogens (2011). Wang, Y. et al.
Hsp90: a promising broad-spectrum antiviral drug target.
Arch. virology , 3269–3282 (2017).
Sultan, I., Howard, S. & Tbakhi, A. Drug repositioning suggests a role for the heat shock protein 90 inhibitor geldanamycinin treating covid-19 infection. arXiv (2020).
Xu, J., Shi, P.-Y., Li, H. & Zhou, J. Broad spectrum antiviral agent niclosamide and its therapeutic potential.
ACS infectiousdiseases (2020).
Liu, J. et al.
Hydroxychloroquine, a less toxic derivative of chloroquine, is effective in inhibiting sars-cov-2 infection invitro.
Cell discovery , 1–4 (2020). Vöhringer, H.-F. & Arastéh, K. Pharmacokinetic optimisation in the treatment of pneumocystis carinii pneumonia.
Clin.pharmacokinetics , 388–412 (1993). Amarelle, L. & Lecuona, E. The antiviral effects of na, k-atpase inhibition: A minireview.
Int. journal molecular sciences , 2154 (2018). Schneider, M. et al.
Severe acute respiratory syndrome coronavirus replication is severely impaired by mg132 due toproteasome-independent inhibition of m-calpain.
J. virology , 10112–10122 (2012). Lin, S.-C. et al.
Effective inhibition of mers-cov infection by resveratrol.
BMC infectious diseases , 144 (2017). Shang, J. et al.
Structural basis of receptor recognition by sars-cov-2.
Nature
Gurwitz, D. Angiotensin receptor blockers as tentative sars-cov-2 therapeutics.
Drug development research (2020).
Yu, H. et al.
Next-generation sequencing to generate interactome datasets.
Nat. methods , 478 (2011). Peri, S. et al.
Development of human protein reference database as an initial platform for approaching systems biology inhumans.
Genome research , 2363–2371 (2003). Law, V. et al.
Drugbank 4.0: shedding new light on drug metabolism.
Nucleic acids research , D1091–D1097 (2014). Gaulton, A. et al.
Chembl: a large-scale bioactivity database for drug discovery.
Nucleic acids research , D1100–D1107(2012). Yang, H. et al.
Therapeutic target database update 2016: enriched resource for bench to clinical drug target and targetedpathway information.
Nucleic acids research , D1069–D1074 (2016). Pawson, A. J. et al.
The iuphar/bps guide to pharmacology: an expert-driven knowledgebase of drug targets and theirligands.
Nucleic acids research , D1098–D1106 (2014). Rual, J.-F. et al.
Towards a proteome-scale map of the human protein–protein interaction network.
Nature , 1173–1178(2005).
Rolland, T. et al.
A proteome-scale map of the human interactome network.
Cell , 1212–1226 (2014).
Luck, K. et al.
A reference map of the human binary protein interactome.
Nature , 402–408 (2020).
Rezende, D. J., Mohamed, S. & Wierstra, D. Stochastic backpropagation and approximate inference in deep generativemodels. arXiv preprint arXiv:1401.4082 (2014).
Kullback, S. & Leibler, R. A. On information and sufficiency.
The annals mathematical statistics , 79–86 (1951). Acknowledgements
S.Ray acknowledges support from ERCIM Alain Bensoussan Fellowship programme grant. S.Bandyopadhyay acknowledgessupport from J.C. Bose Fellowship [SB/S1/JCB-033/2016 to S.B.] by the DST, Govt. of India; SyMeC Project grant[BT/Med-II/NIBMG/SyMeC/2014/Vol. II] was given to the Indian Statistical Institute by the Department of Biotechnology(DBT), Govt. of India. A. Mukhopadhyay acknowledges the support received from the research project grant (Memo No:355(Sanc.)/ST/P/S&T/6G-10/2018 dt. 08/03/2019) of Dept. of Science & Technology and Biotechnology, Govt. of WestBengal, India.