Network Medicine Framework for Identifying Drug Repurposing Opportunities for COVID-19
Deisy Morselli Gysi, Ítalo Do Valle, Marinka Zitnik, Asher Ameli, Xiao Gan, Onur Varol, Susan Dina Ghiassian, JJ Patten, Robert Davey, Joseph Loscalzo, Albert-László Barabási
NNetwork Medicine Framework for Identifying DrugRepurposing Opportunities for COVID-19
Deisy Morselli Gysi , ´Italo Do Valle , Marinka Zitnik , AsherAmeli , Xiao Gan , Onur Varol , Helia Sanchez , Rebecca MarleneBaron , Dina Ghiassian , Joseph Loscalzo , and Albert-L´aszl´o Barab´asi Channing Division of Network Medicine, Department of Medicine, Brigham and Women’s Hospital,Harvard Medical School, Boston, MA 02115, USA Department of Biomedical Informatics, Harvard University, Boston, MA 02115, USA Scipher Medicine, 260 Charles St, Suite 301, Waltham, MA 02453, USA Department of Physics, Northeastern University, Boston, MA 02115, USA Division of Pulmonary and Critical Care Medicine, Department of Medicine, Brigham and Women’sHospital, Harvard Medical School, Boston, MA 02115, USA Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA 02115,USA Department of Network and Data Science, Central European University, Budapest 1051, Hungary. * Those authors contributed equally
April 2020
Abstract
The COVID-19 pandemic demands the rapid identification of drug-repurpusingcandidates. In the past decade, network medicine had developed a framework consist-ing of a series of quantitative approaches and predictive tools to study host-pathogeninteractions, unveil the molecular mechanisms of the infection, identify comorbidities aswell as rapidly detect drug repurpusing candidates. Here, we adapt the network-basedtoolset to COVID-19, recovering the primary pulmonary manifestations of the virusin the lung as well as observed comorbidities associated with cardiovascular diseases.We predict that the virus can manifest itself in other tissues, such as the reproductivesystem, and brain regions, moreover we predict neurological comorbidities. We buildon these findings to deploy three network-based drug repurposing strategies, relyingon network proximity, diffusion, and AI-based metrics, allowing to rank all approved a r X i v : . [ q - b i o . M N ] A p r rugs based on their likely efficacy for COVID-19 patients, aggregate all predictions,and, thereby to arrive at 81 promising repurposing candidates. We validate the accu-racy of our predictions using drugs currently in clinical trials, and an expression-basedvalidation of selected candidates suggests that these drugs, with known toxicities andside effects, could be moved to clinical trials rapidly. The speed and the disruptive nature of the COVID-19 pandemic has taken both publichealth and biomedical research by surprise, demanding the rapid deployment of new inter-ventions, the development, and testing of an effective cure and vaccine. Given the compressedtimescales, the traditional methodologies relying on iterative development, experimental test-ing, clinical validation, and approval of new compounds are not feasible. A more realisticstrategy relies on drug repurposing, requiring us to identify clinically approved drugs, withknown toxicities and side effects, that may have a therapeutic effect in COVID-19 patients.In the past decade, network medicine has developed and validated a series of computa-tional tools that help us identify drug repurposing opportunities . Here we deploy thesetools to analyze the molecular perturbations induced by the virus SARS-CoV2, causing apathophenotype (disease) known as COVID-19 ( Co rona vi rus Di sease 20 ), and to identifypotential drug repurposing candidates. We start by characterizing the COVID-19 diseasemodule (Fig. 1A), representing the network neighborhood of the human interactome per-turbed by SARS-CoV2, and its integrity in 56 tissues, to identify the tissues and organs thevirus could invade. We then explore multiple network-based strategies to prioritize existingdrugs based on their ability to interact with their protein targets and, thereby, perturb thedisease module: network proximity-based methods that use a graph theoretic repurposingstrategy ; diffusion-based methods to capture node similarity ; and approaches relying onartificial intelligence network (AI-Net), that embed all available data to detect efficacy .These three predictive approaches offer us twelve ranked lists, normally applied indepen-dently and validated on different datasets. Here, we combine them using a rank aggregationalgorithm , allowing to exploit their relative advantages and to obtain a final prioritizedranking of drug repurposing candidates that offers higher accuracy than any of the pipelinesalone. After eliminating drugs based on toxicity, delivery, and appropriateness of their usein COVID-19 patients, we selected 81 approved drugs as candidates for drug repurposing.Finally, we integrate experimental data from in vitro models to help identify the network-based mechanism of action for selected compounds and offer further validation using existinggene expression data (Fig. 1B) . 2 Results
SARS-CoV2 infects human cells by hijacking the host’s translation mechanisms to generate29 viral proteins, which bind to multiple human proteins to initiate the molecular processesrequired for viral replication and additional host infection . Gordon et al expressed 26 ofthe 29 SARS-CoV2 proteins and used affinity-purification followed by mass spectrometryto identify 332 human proteins to which the viral proteins bind (Table S1) . We mappedthese 332 proteins to the human interactome, consisting of 18 ,
508 proteins and 332 , . ± .
93 proteins and the comparative Z-Score= 1 .
65 indicatesthat the SARS-CoV2 target-proteins aggregate in the same network vicinity , definingthe location of the COVID-19 disease module within the human interactome. Potentialdrug repurposing candidates must either target proteins within or in the network vicinity ofthis disease module.
Previous work indicates that the expression of a gene associated with a disease in a particulartissue is insufficient for a disease to be manifest in that tissue, but a statistically significantdisease LCC for must be expressed . We, therefore, measured the statistical significance ofthe COVID-19 LCC in 56 tissues, using data from GTEx . With GTEx median value <
5, only 10 ,
823 (58%) of the 18 ,
406 proteins in the interactome are expressed in lung ,while of the 332 viral targets (Fig. 2C) 214 (64%) are expressed. We find that 182 viraltargets form a tissue specific LCC, and given the random expectation of 155 . ± .
82 forthis LCC, we obtain a Z-Score= 1 .
78 for the lung, larger than the Z-Score= 1 .
65 of the LCCin the full-network. Overall, in 30 tissues the LCC exceeds the Z-Score of the full-network,helping us to identify tissues where the virus-induced disease could be manifested (Table 1).The list contains pulmonary and cardiovascular tissues, supporting the clinical observationsthat COVID-19 manifests itself in the respiratory system , but infected patients oftenpresent significant cardiovascular involvement , and patients with underlying cardiovas-cular diseases show increased risk of death . Interestingly, Table 1 indicates that the LCC3s also expressed in the multiple brain regions, likely explaining the recently reported neu-rological manifestations of the disease. We also observe multiple tissues related to thedigestive system (colon, esophagus, pancreas) in this analysis, again consistent with clinicalobservations. Finally, equally unexpected is the fact that Table 1 indicates expression inmultiple reproductive system tissues (vagina, uterus, testis, cervix, ovary), as well as spleen,potentially related to disruptions in the regulation of the immune system (Table 1). Pre-existing conditions worsen prognosis and recovery of COVID-19 patients . Previouswork has shown that the disease relevance of the human proteins targeted by the viruscan predict the symptoms/signs and diseases caused by a pathogen , prompting us toidentify diseases whose molecular mechanisms overlap with cellular processes targeted bySARS-CoV2, allowing us to predict potential comorbidity patterns . We retrieved 3 , , finding that 110 of the 320 proteins targeted bySARS-CoV2 are implicated in disease; however, the overlap between SARS-CoV2 targetsand the pool of the disease genes is not statistically significant (Fisher’s exact test; FDR-BH p adj -value > . S vb metric , where S vb < v and the gene pool associated with disease b . We find that S vb > S vb ),include several cardiovascular diseases and cancer, whose comorbidity in COVID-19 patientsis well documented (Fig. 3). The same metric predicts comorbidity with neurologicaldiseases, in line with our observation, that the viral targets are expressed in the brain(Table 1).In summary, we find that the SARS-CoV2 targets do not overlap with disease genesassociated with any major diseases, indicating that a potential COVID-19 treatment cannot be derived from the arsenal of therapies approved for specific diseases. These findingsargue for a strategy that maps drug targets without regard to their localization within aparticular disease module. However, the diseases modules closest to the SARS-CoV2 viraltargets are those with noted comorbidity for COVID-19 infection, such as pulmonary andcardiovascular diseases, and cancer. We also find multiple network-based evidence linking thevirus to the nervous system, a less explored comorbidity, consistent with the observationsthat many infected patients initially lose olfactory function and taste , and that 36% ofpatients with severe infection requiring hospitalization have neurological manifestations .4 .2 Identifying Drug Repurposing Candidates for COVID-19 Traditional repurposing strategies focus on drugs that target the human proteins to whichviral proteins bind , or on drugs previously approved for other pathogens. The networkmedicine approach described here is driven by the recognition that most approved drugs donot target directly disease proteins, but bind to proteins in their network vicinity . Henceour goal is to identify drug candidates that may or may not target the proteins to which thevirus binds, but nevertheless have the potential to perturb the network vicinity of the virusdisease module. To achieve this end, we utilized several network repurposing strategies: anetwork proximity strategy, identifying drugs whose targets are in the immediate networkvicinity of the viral targets ; a diffusion-based strategy ; and an AI-Net based strategy thatuses machine learning to combine multiple sources of evidence (Fig. 1B). We test thepredictive power of each method independently using a list of drugs under clinical trial forCOVID-19 and combine the evidence provided by each method, arriving at a ranked listof drug repurposing candidates derived from the complete list of drugs in DrugBank (seeMethods). Proximity-based methods allow us to measure the distance between two sets of nodes in anetwork, also determining the statistical significance for the observed proximity. Here weuse proximity to explore the distance between the viral protein targets (approximating theCOVID-19 disease module), and (i) the targets of approved drugs; and (ii) the differentiallyexpressed genes induced by each drug, arriving at three drug ranking lists. • Pipeline P1: For each drug, we measured the network distance to the closest proteintargeted by COVID-19, and applied a degree-preserving randomization procedure toassess its statistical significance, expecting a Z-Score < − .
82, indicating its proximity to SARS-CoV2 targets.In contrast, etanercept, another anti-inflammatory drug with no supported COVID-19 relevance, has a Z-Score=1 .
29, indicating that the drug’s protein targets are far fromthe SARS-CoV2 viral targets (Fig. 4A). We tested the proximity of 6 ,
116 drugs withat least one target in DrugBank, identifying 385 drugs with Z-Scores < −
2, and 1 , < −
1, representing potential repurposing candidates (Fig. 4B). • Pipeline P2: We computed the proximity Z-Score after disregarding for each drugthe targets that are enzymes, carriers or transporters. These are proteins targeted bymultiple drugs, and are often unrelated to the known pharmacological effects of the5rofiled drugs. Of the 5 ,
550 drugs obtained after the filtering, the metric identified 165drugs with Z-Scores < − < −
1. Using this measure, chloroquineand hydroxychloroquine are less proximal to COVID-19 targets, while ribavin, anantiviral drug in clinical trial, gain more proximity (Fig. 4C). • Pipeline P3: The effect of a drug is rarely limited directly to the target proteins, butthe drug can activate or repress biological cascades and biochemical pathways, thatchange the expression patterns of multiple proteins in the network neighborhood ofthe drug’s targets. DrugBank compiles 17 ,
222 differentially expressed genes (DEGs),linked to 793 drugs in multiple cell lines. We measured the proximity between DEGsand COVID-19 targets for 793 drugs, finding 18 drugs with Z-Scores < −
2, and 82drugs with Z-Scores < − Diffusion State Distance (DSD) methods rank drugs based on the network similarity of theirtargets to COVID-19 protein targets. The similarity of two nodes captures the overlap of twoglobal (network-wide) states following the independent perturbation of the two nodes. Weimplemented three statistical measures that resulted in five ranking pipelines (see Methods). • Pipeline D1: L1 norm (Manhattan distance) calculates similarity through the sum overthe absolute value of differences between the elements of the two vectors, providing asymmetric measure whose lower values reflect higher similarity. • Pipeline D2 : As the L1 norm may result in loss of information , we also implementedthe Kullback-Leibler (KL) divergence , which calculates the relative entropy of thevector representation of the two nodes, reporting the average asymmetric similarityvalue over the minimum pairwise similarity values (KL-min), and resulting in valuesbetween 0 and 1. • Pipeline D3: We deployed the KL divergence measure, discussed above, but reportingthe average similarity value over the median pairwise similarity (KL-median). • Pipeline D4: We implemented Jensen-Shannon (JS) divergence , a modified (sym-metrized and smoothed) version of the KL divergence, reporting the average over theminimum value of pairwise similarities (JS-min). • Pipeline D5: Similar to D4, but we report the average over the median value of pairwisesimilarities (JS-median). 6e used these five metrics to rank 3 ,
225 drugs as potential treatments for COVID-19.Baricitinib, for example, is a rheumatological drug currently in trial for COVID-19 and alldiffusion-based pipelines rank it higher than tocilizumab, a drug also indicated for rheuma-tological and severe inflammatory diseases with no proven COVID-19 relevance.
We adopted machine learning tools previously developed for drug repurposing using theprotein-protein interaction network as input , resulting in the AI-Net pipeline that ex-ploits the power of AI in a network context (see Methods).The method learns how to represent ( i.e. , embed) the multimodal graph into a compact,low-dimensional vector space such that the algebraic operations in the learned embeddingspace reflect the topology of the input network (Fig. S4A), and specifies a deep transfor-mation function that maps drugs and diseases to points in the learned space, termed ‘drugand disease embeddings’. As diseases are not independent of each other and genes are of-ten shared between distinct diseases, the method embeds diseases associated with similargenes close together in the embedding space. Similarly, the effects of drugs are not limitedto proteins to which they directly bind, but effects spread throughout the protein-proteininteraction network. To capture these effects, the method embeds closely together drugswhose target proteins have similar local neighborhoods in the underlying protein-proteininteraction network.We use the learned embeddings to generate four lists of candidate drugs for COVID-19,each ranked list containing 1 ,
607 treatment recommendations. To obtain the four rankings,we use four distinct decoders, which decode the structure of small network neighborhoodsaround a drug or a disease node from the learned embeddings. • Pipeline A1: We search for drugs that are in the vicinity of the COVID-19 diseasemodule by calculating the cosine distance between COVID-19 and all drugs in thedecoded embedding space . The decoding is based on the N = 10 nearest neighboringnodes in the embedding space, with a minimum distance between nodes of D = 0 . • Pipeline A2: To prevent nodes in the decoding embedding space to pack together tooclosely, we choose D = 0 . N unchanged, pushing the structures apart intosofter more general features, offering a better overarching view of the embedding spaceat the loss of the more detailed structure. • Pipeline A3: Alternatively, to force the decoding to concentrate on the very localstructure (to the detriment of the overall goal of the exercise), we choose N = 5 toexplore a smaller neighborhood while setting the minimum distance at a midrangepoint, D = 0 .
5. 7
Pipeline A4: Instead of focusing on the finer local structure, we specify the decodersuch that it preserves the broad structure ( N = 10, D = 1), offering a broader view ofthe embedding space at the loss of detailed structure.By inspecting the 20 highest ranked drug candidates offered by the AI-Net pipeline (Ta-ble S4), we observe that several drugs in COVID-19 clinical studies ( e.g. , chloroquine,ritonavir). Other top-ranked drugs include anti-malarial medications and drugs used totreat autoimmune, pulmonary, and cardiovascular diseases. The predictive pipelines discussed above offered altogether twelve rankings, each reflecting adifferent network-based criterion to estimate a drug’s likelihood to show efficacy in treatingCOVID-19 patients. As they all start from the same list of drugs and drug-targets andoperate on the same PPI network, the rankings provided by them are not expected tobe fully independent. To quantify the similarity between them we measure the Kendall τ rank correlation of the rankings provided by each pipeline. We find that two of thetarget proximity-based pipelines, P1 and P2, show high correlation between each other, asdo the four AI-Net pipelines (A1-A4), and the five diffusion-based pipelines (D1-D5). Yet,the correlations across the three basic methods are much lower, and P3, relying on geneexpression patterns, is also somewhat uncorrelated with other pipelines, indicating that thedifferent methods offer complementary ranking information (Fig. 5A).To evaluate the predictive power of the pipelines, we test their ability to recover thedrugs currently in clinical trials as COVID-19 treatment. For this purpose, we obtained alist of 67 drugs currently undergoing clinical trials from ClinicalTrials.gov (Table S5). Weuse the resulting list and the ranking predicted by each pipeline to compute the ROC (re-ceiver operating characteristics) curves and the AUC (area under the curve) scores for modelselection and performance analysis, measuring the quality of separation between positive andnegative instances. As Fig. 5B shows, the best individual ROC curves, of 0 . − .
87, areobtained by the four AI-Net based methods. Note that the performance of the four AI-Netpipelines is largely indistinguishable, in line with the finding that the ranking lists providedby them are highly correlated (Fig. 5A). The second-best performance, of 0 .
70, is providedby the proximity method P3. Close behind is P1 with AUC = 0 .
68, and we find thateliminating some drug targets in P2 decreases the AUC to 0 .
58. As a group, the diffusionmethods offer ROC between 0 . .
56. Their lower performance is somewhat unexpected, asdiffusion-based methods should capture higher order correlations, compared to the proximitymethods, thus one would expect a performance between the proximity-based and the AI-Netmethods, which successfully integrate high order correlates.8ach method extracts its own network-based signal for prioritizing drugs. However, thescores of each method are biased differently, offering different rankings. We used a rankaggregation algorithm to combine the 12 ranking lists, aiming to maximize the numberof pairwise agreements between the final ranking and each input ranking. This objective,known as the Kemeny consensus, is NP-hard to compute ; hence, we used an algorithmto approximate it (see Methods). We first tested whether combining the ranking withineach method class could improve the predictive power of the list provided by the individualpipelines (Fig. 5C). The joint performance of the AI-Net group is 0 .
87, the same as A3.We do observe, however, an improvement for the proximity pipelines in the joint ranking,increasing performance from 0 .
70 for 0 .
72. Interestingly, the combined diffusion pipelineshave lower performance (0 .
54) than the best diffusion pipeline of 0 .
56 observed for D1,D2, and D4. What is particularly encouraging, however, is that when we combine all 12pipelines, we obtain a ROC of 0 .
89, the highest of any individual or combination-basedpipelines, confirming that the individual pipelines offer complementary information that canbe harnessed by the combined ranking. It is this combined list, therefore, that defines ourfinal ranked list of predicted drugs for repurposing.Finally, we manually inspected the joint ranking list, removing drugs with significanttoxicities, eliminating those not appropriate, and removing lower-ranked members of thesame drug class (with some exceptions). Through this process, we arrived at a list of86 drugs selected from the top 10% of the total combined rank list, representing our fi-nal repurposing candidates for COVID-19 (Table 2). The selection contains drugs thatare used for disorders of the respiratory ( e.g. , theophylline, montelukast) and cardiovascu-lar ( e.g. , verapamil, atorvastatin) systems; antibiotics used to treat viral ( e.g. , ribavirin,lopinavir), parasitic ( e.g. , hydroxychloroquine, ivermectin, praziquantel), bacterial ( e.g. , ri-faximin, sulfanilamide), mycotic ( e.g. ,fluconazole), and mycobacterial ( e.g. , isoniazid) infec-tions; and immunomodulating/anti-inflammatory drugs ( e.g. , interferon- β , auranofin, mon-telukast, colchicine); anti-proteasomal drugs ( e.g. , bortezomib, carfilzomib); and a rangeof other less obvious drugs that warrant exploration ( e.g. , aminoglutethimide, melatonin,levothyroxine, calcitriol, selegiline, deferoxamine, mitoxantrone, metformin, nintedanib, cinacal-cet, and sildenafil, among others (Table 2). Our final list includes 11 previously proposed potential drug-repurposing candidates for COVID-19, and 21 drugs that are currently beingtested in clinical trials (Table 2). The drug repurposing list provided in Table 2 ranks drugs based on their network-basedrelationship to the viral targets. However, for a drug to be effective, it may not be sufficientto be proximal—it also needs to induce the right perturbation in the cell, suppressing, for9xample, the expression of proteins the virus needs, and activating the expression of proteinsessential for the cell function and survival that are suppressed by the virus. In this sectionwe use expression data to understand how the drug affects the activity of proteins withinthe COVID-19 disease module, offering insights about the mechanism of action of selecteddrugs.
Connectivity Map:
We retrieved gene expression perturbation profiles for 59 of the81 repurposing candidates from the Connectivity Map (CMap) database , altogetherincluding 5 ,
291 experimental instances (combination of different drugs, cell lines, doses,and time of treatment). To evaluate the degree to which each of these drugs modulate theactivity of COVID-19 targets, we measured the overlap between the perturbed genes andCOVID-19 targets. For example, for mitoxantrone, an antineoplastic drug (Table 2), wefind that 75 (22%) of the COVID-19 targets have a significant overlap with the 2 ,
440 geneshighly perturbed by the drug (3 . µM ) in the lung cell line HCC515 (Fisher’s exact test,FDR-BH p adj -value < .
05) (Fig. 6A). When evaluated across all experimental instances, wefind that for 43 of the 59 drugs, there was a statistically significant overlap of the perturbedgenes with the COVID-19 targets (Fig. 6B). For random selections of 59 drugs from thepool of all drugs, only 13 ± adj -value = 0 .
004 , HA1E,10 . µM ), flutamide (162, p adj -value = 0 . . µM ), and bortezomib (162, p adj -value = 0 .
02, HA1E, 20 . µM ). For cell lines derived from lung tissues (A549 and HCC515),the drugs with the highest overlap with COVID-19 targets are mitoxantrone and ponatinib.These results can help us extract direct experimental evidence that the drug repurposingcandidates selected by our methods modulate processes targeted by the virus, and offermechanistic insights into the biological processes affected by these drugs. For example, wefind that mitoxantrone (HUVEC, 10, µM , 24h) perturbs COVID-19 targets related to cellcycle, viral life cycle, protein transport and organelle organization. Suppressing COVID-19 Induced Expression:
We next asked whether the selecteddrugs can counteract the gene expression perturbations caused by the virus, i.e. , whetherthey down-regulate genes up-regulated by the virus or vice versa . For this analysis, webegin with the 120 differentially expressed genes (DEGs) in the SARS-CoV2 infected ofthe A549 cell line and compare the list with the drug perturbation profiles. For example,bortezomib treatment of the cell line YAPC (20 µM ) counteracts the effects of the SARS-CoV2 infection for 65 genes (Fig. 6C), resulting in an inverted expression profile (Spearmancorrelation ρ = − .
58) (Fig. 6C). We measured the Spearman correlation ρ between theperturbations caused by the drug and perturbations caused by the virus in the A549 cell10ine, where negative correlation values indicate that the drug could counteract the effects ofthe infection. We find that 22 of the 59 drugs profiled in the Connectivity Map have negativecorrelation coefficients (Spearman ρ <
0, FDR-BH p adj -value < . ± in vitro experimental support for the selected repurposing candidates as possible modulators of thebiological processes targeted by the virus. It also indicates how network-based tools canutilize gene expression profiles to explore the potential efficacy of drugs. In this study, we took advantage of recent advances in network medicine to define a list of81 drug repurposing candidates for the treatment of COVID-19, and, using in vitro data, weshow that these drugs do affect biological processes targeted by the virus. The accuracy ofour predictions will further improve as the input or validation data improve. For example, werelied on the results of Gordon et al (2020) , for the map of interactions between the virusand human proteins. There are, however, additional interactions not detected in the study .For example, the ACE2 protein has been recently linked to initial viral association onairway epithelial cells, but in the current data set no viral proteins target it.Note that the utilized predictive pipelines select drugs that, by the virtue of the network-based relationship between their targets and the SARS-CoV2 viral targets, are positionedto perturb effectively the COVID-19 disease module. Some of the perturbations may blockthe virus’ ability to invade the host cells, or limit the molecular level disruption caused bythe infection, potentially alleviating the disease symptoms and shortening the timeline of thedisease. Others, however, may cause perturbations that aggravate the symptoms and theseriousness of the phenotype. Therefore, in ordinary circumstances, we would need molecularexperiments to test the efficacy of these drugs for COVID-19 infected cell lines (Table 2).Yet, as many of these drugs have well-known side effects and toxicities, given the imminentneed for a cure, it may be possible to move those drugs directly into clinical trials. While weare currently pursuing this possibility, releasing the list could offer opportunities for othergroups, with appropriate resources and toolset, to move some of these drugs into screeningor directly to rapid clinical trials. We are, of course, cognizant of the remote, yet real,possibility that these approved drugs with known side effects may exert unique toxicities in11he setting of this novel infection, an outcome that can only be identified in clinical trial.Our study focused on ranking the existing drugs based on their expected efficacy forCOVID-19 patients. This does not mean that drugs that did not make our final list couldnot have efficacy, or that they must be excluded from further consideration. As the input dataimproves, other, currently highly ranked drugs could move to a lower ranking, developing acase for experimental testing and clinical trial, and vice versa . The proposed methodology isgeneral, allowing us to profile the potential efficacy of any drug or a family of drugs, whetheror not they are included in our current reference list.Normally, bioinformatic validation would be followed by experimental screening and po-tentially clinical validation before publication. We are currently pursuing these avenues, fromscreening in human cell lines to clinical trials. We feel, however, that given the strength ofthe bioinformatics validation and the obtained AUC, generating confidence in our method-ologies, and the urgency of the COVID-19 crisis, there is an imminent need for disclosureto offer rationale and guidance for upcoming clinical trials.12 Methods
The human interactome was assembled from 21 public databases that compile experimentally-derived protein-protein interactions (PPI) data: 1) binary PPIs, derived from high-throughputyeast-two hybrid (Y2H) expereriments (HI-Union ), three-dimensional (3D) protein struc-tures (Interactome3D , Instruct , Insider ) or literature curation (PINA , MINT ,LitBM17 , Interactome3D, Instruct, Insider, BioGrid , HINT , HIPPIE , APID , In-Web ); 2) PPIs identified by affinity purification followed by mass spectrometry presentin BioPlex2 , QUBIC , CoFrac , HINT, HIPPIE, APID, LitBM17, InWeb; 3) kinase-substrate interactions from KinomeNetworkX and PhosphoSitePlus ; 4) signaling inter-actions from SignaLink and InnateDB ; and 5) regulatory interactions derived by theENCODE consortium. We used the curated list of PSI-MI IDs provided by Alonso-L´opezet al (2019) , for differentiating binary interactions among the several experimental meth-ods present in the literature-curated databases. Specifically for InWeb, interactions withcuration scores < .
175 (75th percentile) were not considered. All proteins were mappedto their corresponding Entrez ID (NCBI) and the proteins that could not be mapped wereremoved. The final interactome used in our study contains 18 ,
505 proteins and 327 , . and drug-target informationfrom the DrugBank database, containing 26 ,
167 interactions between 7 ,
591 drugs and their4 ,
187 targets.
We used the GTEx database , which contains the median gene expression from RNA-seq for56 different tissues, assuming that genes with a median count lower than 5 are not expressedin that particular tissue. The LCC was calculated using a degree preserving approach ,preventing the repeated selection of the same high degree nodes by choosing 100 degree binsin 1 ,
000 simulations.
Given V , the set of COVID-19 virus targets, the set of drug targets, T , and d ( v, t ), theshortest path length between nodes v ∈ V and t ∈ T in the network, we define d c ( V, T ) = 1 || T || (cid:88) t ∈ T min v ∈ V d ( v, t ). (1)13e also determined the expected distances between two randomly selected groups ofproteins, matching the size and degrees of the original V and T sets. To avoid repeatedlyselecting the same high degree nodes, we use degree-binning (see above). The mean µ d ( V,T ) and standard deviation σ d ( V,T ) of the reference distribution allows us to convert the absolutedistance d c to a relative distance Z d c , defined as Z d c = d c − µ d c ( V,T ) σ d c ( V,T ) . (2) The diffusion state distance (DSD) algorithm uses a graph diffusion property to derive asimilarity metric for pairs of nodes that takes into account how similarly they impact therest of the network. We calculate the expected number of times He ( A, B ) that a randomwalk starting at node A visits node B , representing each node by the vector He ( V i ) = [ He ( V i , V ) , He ( V i , V ) , He ( V i , V ) , ..., He ( V i , V n )] , (3)which describes how a perturbation initiated from that node impacts other nodes in theinteractome. The similarity between nodes A and B is provided by the L1 norm of theircorresponding vector representations, DSD ( A, B ) = || He ( A ) − He ( B ) || . (4)Inspired by the DSD, we developed five new metrics to calculate the impact of drugtargets t on the SARS-CoV2 targets v . The first (Pipeline D1) is defined as I min DSD = 1 | V | (cid:88) t ∈ T min v ∈ V DSD ( t, v ) (5)where DSD ( s, t ) represents the diffusion state distance between nodes t and v . Since theL1 norm of two large vectors may result in loss of information , we also used the metric(Pipeline D2) I min KL = (cid:88) t ∈ T min v ∈ V KL ( t, v ) (6)and (Pipeline D3) I med KL = (cid:88) t ∈ T median v ∈ V KL ( t, v ) (7)14here KL is the Kullback-Leibler (KL) divergence between the vector representations of thenodes t and s . Finally, to provide symmetric measures, we tested the measures (PipelineD4) I min JS = (cid:88) t ∈ T min v ∈ V J S ( t, v ) (8)and (Pipeline D5) I med JS = (cid:88) t ∈ T median v ∈ V J S ( t, v ) (9)where JS is the Jensen Shannon (JS) divergence between the vector representations of nodes t and s . All five measures consider t (cid:54) = v . We designed a graph neural network for COVID-19 treatment recommendations based ona previously developed graph convolutional architecture . The multimodal graph is a het-erogeneous graph G = ( V , R ) with N nodes v i ∈ V representing three distinct types ofbiomedical entities ( i.e. , drugs, proteins, diseases), and labeled edges ( v i , r, v j ) ∈ R repre-senting four semantically distinct types of edges r between the entities ( i.e. , protein-proteininteractions, drug-target associations, disease-protein associations, and drug-disease treat-ments). COVID-19 treatment recommendation task.
We cast COVID-19 treatment rec-ommendation as a link prediction problem on the multimodal graph. The task is to predictnew edges between drug and disease nodes, so that a predicted link between a drug node v i and a disease node v j should indicate that drug v i is a promising treatment for disease v j ( e.g. , COVID-19). Our graph neural network is an end-to-end trainable model for linkprediction on the multimodal graph and has two main components: (1) an encoder: a graphconvolutional network operating on G and producing embeddings for nodes in G , and (2)a decoder: a model optimizing embeddings such that they are predictive of successful drugtreatments. Overview of graph neural architecture.
The neural message passing encoder takesas input a graph G and produces a node d -dimensional embedding z i ∈ R d for every drug anddisease node in the graph. We use the encoder that learns a message passing algorithm and aggregation procedure to compute a function of the entire graph that transforms andpropagates information across graph G . The graph convolutional operator takes into ac-count the first-order neighborhood of a node and applies the same transformation across alllocations in the graph. Successive application of these operations then effectively convolves15nformation across the K -th order neighborhood (i.e., embedding of a node depends on allthe nodes that are at most K steps away), where K is the number of successive operationsof convolutional layers in the neural network model. The graph convolutional operator takesthe form h ( k +1) i = φ (cid:18) (cid:88) r (cid:88) j ∈N ir α ijr W ( k ) r h ( k ) j + α ir h ( k ) i (cid:19) , (10)where h ( k ) i ∈ R d ( k ) is the hidden state of node v i in the k -th layer of the neural networkwith d ( k ) being the dimensionality of this layer’s representation, r is an edge type, matrix W ( k ) r is a edge-type specific parameter matrix, φ denotes a non-linear element-wise activationfunction ( i.e. , a rectified linear unit), and α r denote attention coefficients . To arrive at thefinal embedding z i ∈ R d of node v i , we compute its representation as: z i = h ( K ) i . Next, thedecoder takes node embeddings and combines them to reconstruct labeled edges in G . Inparticular, decoder scores a ( v i , r, v j ) triplet through a function g whose goal is to assign ascore g ( v i , r, v j ) representing how likely it is that drugs v i will treat disease v j ( i.e. , r denotesa ‘treatment‘ relationship). Training the graph neural network.
During model training, we optimize modelparameters using the max-margin loss functions to encourage the model to assign higherprobabilities to successful drug indications ( v i , r, v j ) than to random drug-disease pairs. Wetake an end-to-end optimization approach, that jointly optimize over all trainable parametersand propagates loss function gradients through both encoder and the decoder. To optimizethe model, we train it for a maximum of 100 epochs (training iterations) using the Adamoptimizer with a learning rate of 0 . . To make the model comparable to other drug repurposing methodologies in thisstudy, we do not integrate additional side information into node feature vectors; instead, weuse one-hot indicator vectors as node features. In order for the model to generalize well tounobserved edges, we apply a regular dropout to hidden layer units (Eq. (10)). In practice,we use efficient sparse matrix multiplications, with complexity linear in the number of edgesin G , to implement the model. We use a 2-layer neural architecture with d = 32, d = 32, d i = 128 hidden units in input, output, and intermediate layer, respectively, a dropout rateof 0 .
1, and a max-margin of 0 .
1. We use mini-batching by sampling triples from the mul-timodal graph. That is, we process multiple training mini-batches (mini-batches are of size512), each obtained by sampling only a fixed number of triplets, resulting in dynamic batchesthat change during training. 16 .6 Expression perturbation profiles We retrieved drug perturbation profiles from the Connectivity Map (CMap) database using the Python package CMapPy . For each perturbation profile, we calculated thesignificance of the overlap of perturbed genes ( | Z − Score | >
2) and SARS-CoV2 targetsderived from Gordon, et. al., using Fisher’s Exact Test. We also retrieved gene expressiondata of the cell line A549 after infection with SARS-CoV2 . The correlation between theperturbation scores provided in CMap and the gene expression fold change caused by SARS-CoV2 infection was evaluated using the Spearman correlation coefficient. In both cases, weapplied the Benjamini-Hochberg method for multiple testing correction (FDR < . We used CRank algorithm to combine rankings returned by different methodologies into asingle rank for each drug, which then determined the drug’s repurposing priority. The rankaggregation algorithm starts with ranked lists of drugs, R r , each one arising from a differentmethodology r . Each ranked list is partitioned into equally sized groups, called bags. Eachbag i in ranked list R r has attached importance weight K ir whose initial values are all equal.CRank uses a two-stage iterative procedure to aggregate the individual rankings by takinginto account uncertainty that is present across ranked lists. After initializing the aggregateranking R as a weighted average of ranked lists R r , CRank alternates between the followingtwo stages until no changes are observed in the aggregated ranking R . (1) First, it uses thecurrent aggregated ranking R to update the importance weights K ir for each ranked list. Forthat purpose, the top-ranked drugs in R serve as a temporary gold standard. Given bag i and ranked list R r , CRank updates importance weight K ir based on how many drugs fromthe temporary gold standard appear in bag i using the Bayes factors . (2) Second, theranked lists are re-aggregated based on the importance weights calculated in the previousstage. The updated importance weights are used to revise R in which the new rank R ( C ) ofdrug C is expressed as: R ( C ) = (cid:80) r log K i r ( C ) r R r ( C ), where K i r ( C ) r indicates the importanceweight of bag i r ( C ) of drug C for ranking r , and R r ( C ) is the rank of C according to r .By using an iterative approach, CRank allows for the importance of a ranking not to bepredetermined and to vary across drugs.The final output is a global ranked list R of drugs that represents the collective opinionof the different repurposing methodologies. The Python source code implementation ofCRank is available at https://github.com/mims-harvard/crank . In all experiments, weset the number of bags to 1,000, the size of the temporary gold standard to 0.5% of thetotal number of drugs in R , and the maximum number of iterations to 50. In all cases, thealgorithm converged, in fewer than 20 iterations.17 .8 ROC curves We employed different methodologies to rank drug candidates. Since we lack ground-truthlabels for drugs being effective against the disease, we rely on clinical trials to gather namesof drugs currently in trial. We made an assumption that all the drugs tested in clinical trialsare relevant and based on prior in vitro or in vivo observations. We used this information andthe ranking of each method to compute ROC (Receiver Operating Characteristics) curvesand AUC (area under the curve) scores for model selection and performance analysis. AUCscore measures the quality of the separation between positive and negative instances. Forthe ranked list, we applied different thresholds to compute false-positive and true-positiverates to plot ROC. Scores of AUC range between 0 and 1, where 1 corresponds to perfectperformance and 0.5 indicates the performance of a random classifier. Some methods failto provide a ranking for each drug or to provide a fair comparison between methods, weassumed all the missing ranks should be listed at the bottom of the ranking. We use thePython package Scikit-learn for computing AUC scores and plotting ROC curves.For the ground-truth list, we consider the ClinicalTrials.gov website the primarysource of ongoing trials of drugs fo COVID-19. We are cognizant of its limitations, primarilybeing one of time lags between the implementation of a trial and its appearance on the site.We also quantified the performance of models under different constraints: considering onlydrugs that have at least N trials and considering only the evidence provided up to a certaindate (Fig. S7). 18 Authors Contribution
A.L.B designed the study. A.A, D.M.G, M.Z, and X.G performed drug predictions. I.D.Vanalyzed disease comorbidities and drug validation. A.A, D.M.G, I.D.V, M.Z, O.V, and X.Ganalyzed the data. O.V. carried out ClinicalTrials.gov data analysis for model selection andperformance analysis. J.L manually curated the drug candidates. A.L.B, D.M.G, and I.D.Vwrote the paper with input from all authors. All authors read and approved the manuscript.D.G guided A.A with designing diffusion-based similarity implementations and H.S curatedlist of promising drugs for COVID-19.
This work was supported, in part, by NIH grants HG007690, HL108630, and HL119145, andby AHA grant D700382 to J.L; A.L.B is supported by NIH grant 1P01HL132825, AmericanHeart Association grant 151708, and ERC grant 810115-DYNASET.We wish to thank Nicolette Lee and Grecia for providing support, Marc Santolini forsuggestions in the diffusion-based methods.
J.L. and A.L.B are co-scientific founder of Scipher Medicine, Inc., which applies networkmedicine strategies to biomarker development and personalized drug selection. A.L.B is thefounder of Nomix Inc. and Foodome, Inc. that apply data science to health; O.V and D.M.Gare scientific consultants for Nomix Inc. I.D.V is a scientific consultant for Foodome Inc.19 iral Disease ModuleViral Interactome Human InteractomeViral-HumanProtein-Protein Interaction Human-HumanProtein-Protein Interaction Drug-HumanProtein-Protein InteractionDrug Disease Module AB Input Data Methods Outcomes
Human InteractomeN = 18,508 proteinsL = 332,749 PPIs Network Proximity3 pipelines Infected Tissues/OrgansSARS-COV2 targets320 human proteins Network Diffusion5 pipelines ComorbidityDrug Targets7,591drugs4,187 drug targets AI Prioritization4 pipelines Drug Repurposing& Validation
Figure 1:
Network Medicine Approaches to Drug Repurposing. (A)
The physical inter-actions that we use as input in the network medicine framework: Virus-human protein interaction,capturing the human proteins to which the viral proteins can bind; human protein-protein inter-actions, defining the human interactome of 18 ,
508 proteins linked by 332 ,
749 pairwise physicalinteractions; and the drug-human protein interactions, capturing the human protein targets of eachdrug in DrugBank. (B)
A schematic representation of the input data we use for the predictions, thethree prediction methods and the resulting pipelines, and the outcomes provided by the analysis. RKAR2A PRKACA
Expressed in lungNot expressed in lungLCC D e n s i t y
140 160 180 200 220 240 2600.0000.0050.0100.0150.0200.0250.030
LCC D e n s i t y
100 120 140 160 180 200 2200.0000.0050.0100.0150.0200.0250.030
A CB
Full Interactome Lung Interactome
COVID-19 LCCCOVID-19 LCC I n f e c t i o n s O t h e r s M e t a b o l i c N e r v o u s S y s t e m C a n c e r I m m u n e S y s t e m E n d o c r i n e D i g e s t i v e C o n g e n i t a l C a r d i o v a s c u l a r Figure 2:
The COVID-19 Disease Module. (A)
Proteins targeted by SARS-CoV2 are notdistributed randomly in the human interactome, but form a large connected component (LCC)consisting of 208 proteins, as well as multiple small subgraphs. We do not show the 93 viral targetsthat do not interact with other viral targets. Proteins not expressed in the lung are shown in orange,indicating that almost all proteins in SARS-CoV2 LCC are expressed in the lung, explaining theeffectiveness of the virus in causing pulmonary infections. (B)
The random expectation of the LCCsize, indicating that the observed COVID-19 LCC, whose size is indicated by the red arrow, islarger than expected by chance. (C)
Similarly, the lung-based LCC is also greater than expectedby chance. umber ofDisease Genes S vb Neoplasms by histologic siteNeurodegenerative diseasesBrain diseasesCentral nervous system diseasesHerododegenartive diseasesSkin and connective tissueEndocrine system diseasesHeart diseasesNeoplasmsMusculoskeletal diseasesNeurologic manisfestationsCongenital abnormalities I n f e c t i o n s O t h e r s M e t a b o l i c N e r v o u s S y s t e m C a n c e r I m m u n e S y s t e m E n d o c r i n e D i g e s t i v e C o n g e n i t a l C a r d i o v a s c u l a r Figure 3:
Disease Comorbidity . We measured the network proximity between COVID-19 targetsand 299 diseases. The figure represents each disease as a circle whose radius reflects the number ofdisease genes associated with it . The diseases closest to the center, whose names are marked, areexpected to have higher comorbidity with the COVID-19 outcome. The farther is a disease fromthe center, the more distant are its disease proteins from the COVID-19 viral targets. hloroquine targetsEtanercept targetsShared targetsBackground genesCOVID-19 binding targets BA Chloroquine COVID-19Proximal Distant Etanercept d = 1.22 d = 1.57z = -1.82 z = 1.29
PKP2RHOA PIK3R1GSTA4PRKACA CSNK2A2TFAP2A C1QB FCGR1ARAC1FCGR2B ECSIT FCGR2CBRD4RELA CYP2C8ABCB1 FCGR2AETV5CYP3A5 CYP3A4CYP2D6 NPTX1POR FCGR3BGSTA2 ERC1FCGR3A TLR9MYH9DISC1C1RC1SCTCF CEP250CIT C1QARAB8A C1QC CYP1A1TNF TNFRSF1BRIPK1 LTACYB5R3
OritavancinRitonavirLopinavirChloroquineHydroxychloroquineRibavirin o f d r u g s Z-score
Figure 4:
Using Proximity to Predict Repurposing Drugs: (A)
The local neighborhoodof the human interactome showing the targets of the drug chloroquine and the reference drug,dextrotyroxine, and the proteins closest to them targeted by COVID-19 viral proteins. (B)
Distribution of proximity scores for 6 ,
116 drugs, capturing their distance to SARS-CoV2 targets.The six lighter bars indicate the proximity of drugs currently tested in clinical trials for COVID-19. A C
Individual ROC Combined ROC
P3D1 D2 D3 D4 D5 P1 P2 A1 A2 A3 A4 τ P3D1D2D3D4D5P1P2A1A2A3A4
False positive rate T r u e p o s i t i v e r a t e False positive rate T r u e p o s i t i v e r a t e Figure 5:
Comparison of the Predictive Pipelines . (A) Heatmap of the Kendall τ capturingthe correlation between the ranking predicted by the 12 drug repurposing pipelines. Methods usingdifferent approaches are not correlated, potentially prioritizing different drugs. (B) ROC Curvesand AUC for each of the twelve pipelines used for drug repurposing, using as a gold standard thedrugs under evaluation in clinical trial for treating COVID-19 (Table S5). (C)
The performanceof the overall cRank (all), which combines all pipelines into a final ranking list, is higher than theperformance of each method individually (cRanks AIs, Ps and Ds). .66-5.66 -3 30Z-score Gene expression perturbation AB l og2 ( f o l d - c h a n g e ) i n f ec t i o n A Perturbation Z-score Bortezomib YAPC 20 μ m MARK3NUP54
CUL2
HYOU1
TARS2RAB14
ARF6
MAP7D1 GHITM
NINL
UBAP2 ERP44
MRPS27
TUBGCP2
RETREG3
OS9
TMEM97
GOLGA3
LARP1
TLE1CEP250
MEPCE
CENPF
NUP62
BAG5
THTPA
NGDN
PRKACA
PRKAR2A
PRKAR2B
FBXL12
EXOSC5
G3BP2
CYB5R3
RAB1A
RAB2A
RBX1
AATF
GCC1
CEP350
SCCPDH
ERO1B
MRPS2
COL6A1GOLGA7
ZNF318
Figure 6:
Validation Using Gene Expression Data . (A) Local region of the interactomeshowing the COVID-19 targets. The drug mitoxantrone (3 . µM , 24h) perturbs the gene ex-pression of 75 COVID-19 targets (labeled proteins) in the lung cell line HCC515 (green and redcolors represent down- and up-regulation, respectively). (B) : The comparison of bortezomib treat-ment (YAPC, 20 µM ) and SARS-CoV2 infection perturbation profiles shows a negative correlation(Spearman ρ = − .
58, FDR-BH p adj -value = 1 . × − ), indicating that the drug counteracts theeffects of the infection for 65 genes (orange dots). The straight line shows a linear fit between thetwo profiles and the respective confidence interval. Positive values represent upregulated expressionand negative values represents down-regulated expression on both axes. able 1: Tissues Affected by SARS-CoV2 . The list of 30 tissues whose Z-Scores are higherthan the overall Z-Score of the COVID-19 LCC. Tissues in the same or similar systems or organsare shaded by the same color.
Tissue LCC Z-Score
Immortalized cell line 171 2.114Vagina 185 2.062Brain-Frontal Cortex 162 1.923Pancreas 133 1.908Heart-Left Ventricle 129 1.897Brain-Cortex 161 1.889Brain-Hippocampus 149 1.884Colon-Sigmoid 179 1.870Kidney-Cortex 151 1.848Fibroblasts 183 1.843 Adrenal Gland 168 1.816Uterus 184 1.808Cervix-Endocervix 185 1.801Bladder 179 1.799Testis 189 1.794Lung 182 1.780Artery 178 1.777Spleen 173 1.761Colon 179 1.760Brain-Hypothalamus 157 1.757Esophagus-Mucosa 175 1.757Cervix-Ectocervix 184 1.730Ovary 182 1.726Skin 178 1.720Heart-Atrial Appendage 153 1.716Prostate 183 1.715Brain-Spinal cord 169 1.713Kidney 167 1.704Brain-Anterior cingulate cortex 152 1.690
All 208 1.658 able 2: Drug Repurposing Candidates.
The list of the 81 drugs selected for repurposing. Itshows the drugs’ name, the final combined rank of each drug, the number of clinical trials in whichthe drug is being tested for COVID-19 and references to paper, that already noted their potentialCOVID-19 relevance. Drug C-rank Drug C-rank Drug C-rank
Ritonavir IsoniazidTroleandomycinCilostazolChloroquine
RifabutinFlutamideDexamethasoneRifaximinAzelastineFolic AcidRabeprazoleMethotrexateDigoxinTheophyllineFluconazoleAminoglutethimideHydroxychloroquineMethimazoleRibavirinOmeprazoleBortezomibLeflunomideDimethylfumarateColchicineQuercetinMebendazole MesalazinePentamidineVerapamilMelatonin GriseofulvinAuranofinAtovaquoneMontelukastRomidepsinCobicistatLopinavirPomalidomideSulfinpyrazoneLevamisoleCalcitriolInterferon-β-1aPraziquantelAscorbic acidFluvastatinInterferon-β-1bSelegilineDeferoxamine576367 699298109112118124131138141146155157161164173176195199203206227Ivermectin AtorvastatinMitoxantroneGlyburideThalidomide SulfanilamideHydralazineGemfibrozilRuxolitinibPropranololCarbamazepineDoxorubicinLevothyroxineDactinomycinTenofivirTadalafilDoxazosinRosiglitazoneAminolevulinic acidNitroglycerinMetforminNintedanibAllopurinolPonatinibSildenafil235243250259262 265269281284297301309329335338339367397398418457466471491493DapagliflozinNitroprussideCinacalcetMexiletineSitagliptinCarfilzomibAzithromycin 504515553559706765786
Reference ClinicalTrials.gov
79 431380811313 eferences [1] F. Cheng et al. Network-based approach to prediction and population-based vali-dation of in silico drug repurposing. Nature Communications , 9(1):2691, 12 2018.ISSN 20411723. doi: 10.1038/s41467-018-05116-5. URL .[2] E. Guney, J. Menche, M. Vidal, and A.-L. L. Bar´abasi. Network-based in silico drugefficacy screening.
Nature Communications , 7(1):10331, 2 2016. ISSN 20411723. doi:10.1038/ncomms10331. URL .[3] Y. Zhou et al. Network-based drug repurposing for novel coronavirus 2019-nCoV/SARS-CoV-2.
Cell Discovery , 6(1):1–18, 12 2020. ISSN 20565968. doi: 10.1038/s41421-020-0153-3.[4] F. Cheng et al. A genome-wide positioning systems network algorithm for in silicodrug repurposing.
Nature Communications , 10(1):1–14, 12 2019. ISSN 20411723. doi:10.1038/s41467-019-10744-6.[5] M. Zitnik et al. Machine Learning for Integrating Data in Biology and Medicine:Principles, Practice, and Opportunities.
An international journal on infor-mation fusion , 50:71–91, oct 2019. ISSN 1566-2535. doi: 10.1016/j.inffus.2018.09.012. URL .[6] M. Zitnik, M. Agrawal, and J. Leskovec. Modeling polypharmacy side effects with graphconvolutional networks. In
Bioinformatics , 2018. doi: 10.1093/bioinformatics/bty294.[7] A. I. Casas et al. From single drug targets to synergistic network pharmacology inischemic stroke.
Proceedings of the National Academy of Sciences of the United Statesof America , 116(14):7129–7136, 2019. ISSN 10916490. doi: 10.1073/pnas.1820799116.[8] M. Cao et al. Going the Distance for Protein Function Prediction: A New DistanceMetric for Protein Interaction Networks.
PLoS ONE , 2013. ISSN 19326203. doi: 10.1371/journal.pone.0076339.[9] M. Zitnik, R. Sosic, and J. Leskovec. Prioritizing network communities.
Nature Com-munications , 9(1):2544, 2018.[10] A. Subramanian et al. A Next Generation Connectivity Map: L1000 Platform andthe First 1,000,000 Profiles.
Cell , 171(6):1437–1452, 11 2017. ISSN 10974172. doi:10.1016/j.cell.2017.10.049. 2811] J. Lamb et al. The connectivity map: Using gene-expression signatures to connect smallmolecules, genes, and disease.
Science , 313(5795):1929–1935, 9 2006. ISSN 00368075.doi: 10.1126/science.1132939.[12] A. R. Fehr and S. Perlman. Coronaviruses: An overview of their replication and patho-genesis. In
Coronaviruses: Methods and Protocols , volume 1282, pages 1–23. SpringerNew York, 2 2015. ISBN 9781493924387. doi: 10.1007/978-1-4939-2438-7 {\ } bioRxiv , 2020. doi: 10.1101/2020.03.22.002386. URL .[14] N. Gulbahce et al. Viral perturbations of host networks reflect disease etiology.
PLoSComputational Biology , 8(6), 6 2012. ISSN 1553734X. doi: 10.1371/journal.pcbi.1002531.[15] M. Kitsak et al. Tissue Specificity of Human Disease Module.
Scientific Re-ports , 6:35241, 10 2016. ISSN 20452322. doi: 10.1038/srep35241. URL .[16] J. Lonsdale et al. The Genotype-Tissue Expression (GTEx) project, 6 2013. ISSN10614036.[17] Z. Xu et al. Pathological findings of COVID-19 associated with acute respiratory distresssyndrome.
The Lancet Respiratory Medicine , 8(4):420–422, 4 2020. ISSN 22132619. doi:10.1016/S2213-2600(20)30076-X.[18] Y. Yang et al. The deadly coronaviruses: The 2003 SARS pandemic and the 2020 novelcoronavirus epidemic in China, 5 2020. ISSN 10959157.[19] C. Huang et al. Clinical features of patients infected with 2019 novel coronavirus inWuhan, China.
The Lancet , 395(10223):497–506, 2 2020. ISSN 1474547X. doi: 10.1016/S0140-6736(20)30183-5.[20] Y. Y. Zheng, Y. T. Ma, J. Y. Zhang, and X. Xie. COVID-19 and the cardiovascularsystem, 3 2020. ISSN 17595010.[21] L. Mao et al. Neurologic Manifestations of Hospitalized Patients With CoronavirusDisease 2019 in Wuhan, China.
JAMA neurology , 4 2020. ISSN 2168-6157. doi: 10.1001/jamaneurol.2020.1127. URL .2922] M. Eliezer et al. Sudden and Complete Olfactory Loss Function as a Possible Symptomof COVID-19.
JAMA otolaryngology– head & neck surgery , 4 2020. ISSN 2168-619X. doi:10.1001/jamaoto.2020.0832. URL .[23] S. J. Pleasure, A. J. Green, and S. A. Josephson. The Spectrum of Neurologic Disease inthe Severe Acute Respiratory Syndrome Coronavirus 2 Pandemic Infection: NeurologistsMove to the Frontlines.
JAMA neurology , 4 2020. ISSN 2168-6157. doi: 10.1001/jamaneurol.2020.1065. URL .[24] C. Qin et al. Dysregulation of Immune Response in Patients with COVID-19 in Wuhan,China.
SSRN Electronic Journal , 2 2020. ISSN 1556-5068. doi: 10.2139/ssrn.3541136.URL .[25] C. Song et al. Detection of 2019 novel coronavirus in semen and testicular biopsyspecimen of COVID-19 patients. medRxiv , page 2020.03.31.20042333, 4 2020. doi:10.1101/2020.03.31.20042333.[26] G. Grasselli et al. Baseline Characteristics and Outcomes of 1591 Patients InfectedWith SARS-CoV-2 Admitted to ICUs of the Lombardy Region, Italy.
JAMA , 4 2020.ISSN 1538-3598. doi: 10.1001/jama.2020.5394. URL .[27] J. Park, D.-S. Lee, N. A. Christakis, and A.-L. Barab´asi. The impact of cellular networkson disease comorbidity.
Molecular Systems Biology , 2009. ISSN 1744-4292. doi: 10.1038/msb.2009.16.[28] C. A. Hidalgo, N. Blumm, A. L. Barab´asi, and N. A. Christakis. A Dynamic NetworkApproach for the Study of Human Phenotypes.
PLoS Computational Biology , 5(4):e1000353, 4 2009. ISSN 1553734X. doi: 10.1371/journal.pcbi.1000353. URL https://dx.plos.org/10.1371/journal.pcbi.1000353 .[29] D. S. Lee et al. The implications of human metabolic network topology for diseasecomorbidity.
Proceedings of the National Academy of Sciences of the United States ofAmerica , 105(29):9880–9885, 7 2008. ISSN 00278424. doi: 10.1073/pnas.0802208105.[30] J. Menche et al. Uncovering disease-disease relationships through the incomplete in-teractome.
Science , 347(6224), 5 2015. ISSN 00368075. doi: 10.1126/science.1065103.URL .[31] N. Chen et al. Epidemiological and clinical characteristics of 99 cases of 2019 novelcoronavirus pneumonia in Wuhan, China: a descriptive study.
The Lancet , 395(10223):507–513, 2 2020. ISSN 1474547X. doi: 10.1016/S0140-6736(20)30211-7.3032] D. Wang et al. Clinical Characteristics of 138 Hospitalized Patients with 2019 NovelCoronavirus-Infected Pneumonia in Wuhan, China.
JAMA - Journal of the AmericanMedical Association , 3 2020. ISSN 15383598. doi: 10.1001/jama.2020.1585.[33] A. Giacomelli et al. Self-reported olfactory and taste disorders in SARS-CoV-2 patients:a cross-sectional study.
Clinical Infectious Diseases , 2020. ISSN 1058-4838. doi: 10.1093/cid/ciaa330. URL https://academic.oup.com/cid/advance-article/doi/10.1093/cid/ciaa330/5811989 .[34] M. A. Yildirim et al. Drug-target network.
Nature Biotechnology , 25(10):1119–1126,10 2007. ISSN 10870156. doi: 10.1038/nbt1338. URL http://dx.doi.org/10.1038/nbt1338 .[35] C. C. Aggarwal, A. Hinneburg, and D. A. Keim. On the Surprising Behavior of DistanceMetrics in High Dimensional Space. In J. den Bussche and V. Vianu, editors,
DatabaseTheory — ICDT 2001 , pages 420–434, Berlin, Heidelberg, 2001. Springer Berlin Hei-delberg. ISBN 978-3-540-44503-6.[36] S. Kullback and R. A. Leibler. On Information and Sufficiency.
Ann. Math. Statist. ,22(1):79–86, 1951. doi: 10.1214/aoms/1177729694. URL https://doi.org/10.1214/aoms/1177729694 .[37] J. Lin. Divergence measures based on the Shannon entropy.
IEEE Transactions onInformation Theory , 37(1):145–151, jan 1991. ISSN 1557-9654. doi: 10.1109/18.61115.[38] M. Zitnik, M. Agrawal, and J. Leskovec. Modeling polypharmacy side effects with graphconvolutional networks.
Bioinformatics , 34(13):457–466, 2018.[39] M. Zitnik et al. Machine learning for integrating data in biology and medicine: Princi-ples, practice, and opportunities.
Information Fusion , 50:71–91, 2019.[40] E. Becht et al. Dimensionality reduction for visualizing single-cell data using UMAP.
Nature Biotechnology , 37(1):38, 2019.[41] J. Bartholdi, C. A. Tovey, and M. A. Trick. Voting schemes for which it can be difficultto tell who won the election.
Social Choice and welfare , 6(2):157–165, 1989.[42] C. Dwork, R. Kumar, M. Naor, and D. Sivakumar. Rank aggregation methods for theweb. In
Proceedings of the 10th international conference on World Wide Web , pages613–622, 2001. 3143] P. Zhou et al. A pneumonia outbreak associated with a new coronavirus of proba-ble bat origin.
Nature , 579(7798):270–273, 3 2020. ISSN 14764687. doi: 10.1038/s41586-020-2012-7.[44] D. Blanco-Melo et al. SARS-CoV-2 launches a unique transcriptional signature fromin vitro, ex vivo, and in vivo systems. bioRxiv , page 2020.03.24.004655, 2020. doi:10.1101/2020.03.24.004655.[45] H. Zhang et al. Angiotensin-converting enzyme 2 (ACE2) as a SARS-CoV-2 receptor:molecular mechanisms and potential therapeutic target.
Intensive Care Medicine , 46(4):586–590, 2020. ISSN 14321238. doi: 10.1007/s00134-020-05985-9. URL https://doi.org/10.1007/s00134-020-05985-9 .[46] M. Hoffmann et al. SARS-CoV-2 Cell Entry Depends on ACE2 and TMPRSS2 and IsBlocked by a Clinically Proven Protease Inhibitor.
Cell , 0(0), 2020. ISSN 10974172.doi: 10.1016/j.cell.2020.02.052.[47] K. Luck et al. A reference map of the human protein interactome. bioRxiv , page 605451,jan 2019. doi: 10.1101/605451. URL http://biorxiv.org/content/early/2019/04/19/605451.abstract .[48] R. Mosca, A. C´eol, and P. Aloy. Interactome3D: adding structural details to proteinnetworks.
Nature methods , 10(1):47–53, jan 2013. ISSN 1548-7105. doi: 10.1038/nmeth.2289. URL .[49] M. J. Meyer, J. Das, X. Wang, and H. Yu. INstruct: a database of high-quality 3D structurally resolved protein interactome networks.
Bioinformatics(Oxford, England) , 29(12):1577–9, jun 2013. ISSN 1367-4811. doi: 10.1093/bioinformatics/btt181. URL .[50] M. J. Meyer et al. Interactome INSIDER: a structural interactome browser forgenomic studies.
Nature methods , 15(2):107–114, 2018. ISSN 1548-7105. doi:10.1038/nmeth.4540. URL .[51] M. J. Cowley et al. PINA v2.0: mining interactome modules.
Nucleic acidsresearch , 40(Database issue):D862–5, jan 2012. ISSN 1362-4962. doi: 10.1093/nar/gkr967. URL .3252] L. Licata et al. MINT, the molecular interaction database: 2012 update.
Nucleic AcidsResearch , 40(D1):D857–D861, jan 2012. ISSN 1362-4962. doi: 10.1093/nar/gkr930.URL https://academic.oup.com/nar/article-lookup/doi/10.1093/nar/gkr930 .[53] A. Chatr-Aryamontri et al. The BioGRID interaction database: 2017 update.
Nu-cleic acids research , 45(D1):D369–D379, 2017. ISSN 1362-4962. doi: 10.1093/nar/gkw1102. URL .[54] J. Das and H. Yu. HINT: High-quality protein interactomes and their applications inunderstanding human disease.
BMC Systems Biology , 6, 2012. ISSN 17520509. doi:10.1186/1752-0509-6-92.[55] G. Alanis-Lobato, M. A. Andrade-Navarro, and M. H. Schaefer. HIPPIE v2.0: en-hancing meaningfulness and reliability of protein-protein interaction networks.
Nu-cleic acids research , 45(D1):D408–D414, 2017. ISSN 1362-4962. doi: 10.1093/nar/gkw985. URL .[56] D. Alonso-L´opez et al. APID database: Redefining protein-protein interaction experi-mental evidences and binary interactomes.
Database , 2019(i):1–8, 2019. ISSN 17580463.doi: 10.1093/database/baz005.[57] T. Li et al. A scored human protein-protein interaction network to catalyze genomicinterpretation.
Nature Methods , 14(1):61–64, 2016. ISSN 15487105. doi: 10.1038/nmeth.4083. URL http://dx.doi.org/10.1038/nmeth.4083 .[58] E. L. Huttlin et al. Architecture of the human interactome defines protein communitiesand disease networks.
Nature , 545(7655):505–509, 5 2017. ISSN 14764687. doi: 10.1038/nature22366.[59] M. Y. Hein et al. A Human Interactome in Three Quantitative Dimensions Organizedby Stoichiometries and Abundances.
Cell , 163(3):712–723, 10 2015. ISSN 10974172.doi: 10.1016/j.cell.2015.09.053.[60] C. Wan et al. Panorama of ancient metazoan macromolecular complexes.
Nature , 525(7569):339–44, sep 2015. ISSN 1476-4687. doi: 10.1038/nature14877. URL .3361] F. Cheng, P. Jia, Q. Wang, and Z. Zhao. Quantitative network map-ping of the human kinome interactome reveals new clues for rational ki-nase inhibitor discovery and individualized cancer therapy.
Oncotarget , 5(11):3697–710, 2014. ISSN 1949-2553. doi: 10.18632/oncotarget.1984. URL .[62] P. V. Hornbeck et al. PhosphoSitePlus, 2014: mutations, PTMs and recalibrations.
Nucleic acids research , 43(Database issue):D512–20, jan 2015. ISSN 1362-4962. doi:10.1093/nar/gku1267. URL .[63] D. Fazekas et al. SignaLink 2 - a signaling pathway resource with multi-layered reg-ulatory networks.
BMC systems biology , 7:7, jan 2013. ISSN 1752-0509. doi: 10.1186/1752-0509-7-7. URL .[64] K. Breuer et al. InnateDB: systems biology of innate immunity andbeyond–recent updates and continuing curation.
Nucleic acids research , 41(Database issue):D1228–33, jan 2013. ISSN 1362-4962. doi: 10.1093/nar/gks1147. URL .[65] J. Gilmer et al. Neural message passing for quantum chemistry. In
ICML , pages 1263–1272. JMLR. org, 2017.[66] P. Veliˇckovi´c et al. Graph attention networks.
ICLR , 2018.[67] D. Kingma and J. Ba. Adam: A method for stochastic optimization. arXiv:1412.6980 ,2014.[68] X. Glorot and Y. Bengio. Understanding the difficulty of training deep feedforwardneural networks. In
AISTATS , pages 249–256, 2010.[69] W. Hamilton et al. Embedding logical queries on knowledge graphs. In
NIPS , pages2026–2037, 2018.[70] N. Srivastava et al. Dropout: a simple way to prevent neural networks from overfitting.
JMLR , 15(1):1929–1958, 2014.[71] W. Hamilton, Z. Ying, and J. Leskovec. Inductive representation learning on largegraphs. In
NIPS , pages 1024–1034, 2017.3472] O. M. Enache et al. The GCTx format and cmap { Py, R, M, J } packages: resources foroptimized storage and integrated traversal of annotated dense matrices. Bioinformatics ,35(8):1427–1429, apr 2019. ISSN 1367-4803. doi: 10.1093/bioinformatics/bty784. URL https://academic.oup.com/bioinformatics/article/35/8/1427/5094509 .[73] R. E. Kass and A. E. Raftery. Bayes factors.
Journal of the American StatisticalAssociation , 90(430):773–795, 1995.[74] G. Casella and E. Moreno. Assessing robustness of intrinsic tests of independence intwo-way contingency tables.
Journal of the American Statistical Association , 104(487):1261–1271, 2012.[75] F. Pedregosa et al. Scikit-learn: Machine Learning in Python. Technical report, 2011.URL http://scikit-learn.sourceforge.net.
Table S1:
SARS-CoV2-Human Interactome . Protein-protein interactions between 29 SARS-CoV2 proteins and 332 human proteins detected by affinity purification followed by mass spec-trometry (dataset retrieved from Gordon et al ).Table S2: Network Overlap Between 299 Diseases and SARS-CoV2 Targets . The S vb measure captures the network-based overlap between SARS-CoV2 targets v and the gene poolassociated with disease b .Table S3: Repurposing Candidates in Nature News able S4: Top-ranked drugs in the AI-based ranking (ranking ‘A3‘) . Shown are top-20drugs and conditions for which the drugs are indicated.
Rank Drug ID Drug name Current indications1 DB01117 Atovaquone Hematologic cancer, Malaria2 DB01201 Rifapentine Pulmonary tuberculosis3 DB00608 Chloroquine Rheumatoid Arthritis, Malaria, Sarcoidosis4 DB00834 Mifepristone Cushing’s disease, Meningioma, Brain cancer5 DB00431 Lindane Pediculus capitis infestation6 DB09029 Secukinumab Chronic small plaque psoriasis7 DB11574 Elbasvir Hepatitis C8 DB09065 Cobicistat HIV Infections9 DB09054 Idelalisib Lymphocytic Leukemi10 DB09102 Daclatasvir Hepatitis C11 DB08880 Teriflunomide Multiple sclerosis, Lupus nephritis, Rheumatoid arthri-tis12 DB11569 Ixekizumab Chronic small plaque psoriasis13 DB01058 Praziquantel Schistosomiasis, Opisthorchiasis14 DB00503 Ritonavir Acquired immunodeficiency syndrome, Hepatitis C,HIV-1 infection15 DB13179 Troleandomycin16 DB01222 Budesonide Ulcerative colitis, Liver cirrhosis, Lung diseases, Asthma17 DB09212 Loxoprofen Rheumatoid Arthritis18 DB00687 Fludrocortisone Dysautonomia, Mitral Valve Prolapse Syndrome,Parkinson’s disease, Peripheral motor neuropathy19 DB08865 Crizotinib Lung cancer, Hematologic cancer20 DB09101 Elvitegravir HIV Infections
Table S5:
Drugs Under Evaluation in Clinical Trials for Treating COVID-19 . We collectedall the clinical trials relevant to COVID-19 from the ClinicalTrials.gov platform. Here we providethe clinical trial ID, the interventions, the phase, status, enrollment, and the start and end date ofeach trial. igure S1: Distribution of the Proximity ( S vb ) Between Diseases and COVID-19 Tar-gets . S vb values repressent the network-based overlap between SARS-CoV2 targets v and the genesassociated with each disease b .Figure S2: Comparison of Diffusion-Based Measures on Ranking HIV Drugs . Side by sideboxplot of ranking distributions across 5 different diffusion-based methods on HIV. The distributionof 22 FDA approved HIV drugs is shown by red and each circle shows one distinct drug. igure S3: Comparison of Diffusion-Based Measures on Ranking Drugs Being Testedfor COVID-19 . Side by side boxplot of ranking distributions across 5 different diffusion-basedmethods on COVID-19. The distribution of 24 distinct potential drugs is shown by red and eachcircle shows one distinct drug. The list of 24 potential drugs was obtained from curation of NatureNews (Table S3). igure S4: Overview of AI-based Strategy for Drug Repurposing . (A) Visualization ofthe learned embedding space. Every point represents a drug (in blue) or a disease (in orange). Ifa drug and a disease are embedded close together in this space, this means the underlying PPInetworks of the drug and the disease are predictive of whether the drug can treat the disease. (B)
Probability distributions of indications and non-indications learned by the AI model are well-separated, indicating the model can distinguish between successful and failed drug indications. (C)
Predictive performance of the AI model on the held-out test set of drug indications. Higher valuesindicated better performance (AUROC, Area under the ROC curve; AUPRC, Area under the PRcurve; MAP@50, Mean average precision at top 50. igure S5: Drug Repurposing Candidates Show Greater Overlap of Perturbed Genesand COVID-19 Targets . We find that of the 59 repurposing candidates present in the Con-nectivity Map database 43 have statistically significant overlap of perturbed genes and COVID-19 targets. We created a reference distribution by randomly selecting 59 drugs and measuring thesame overlap in 500 iterations. On average, only 17 ± . Drug Repurposing Candidates Show Greater Anticorrelated Effects withSARS-CoV2 infection . We find that of the 59 repurposing candidates present in the Con-nectivity Map database 22 have have negative correlation coefficients (Spearman ρ <
0, FDR-BHp adj -value < .
05) when comparing with SARS-CoV2 infection perturbations. We created a ref-erence distribution by randomly selecting 59 drugs and measuring the Spearman correlation in 500iterations. On average, only 3 ± Date A U C S c o r e P1P2P3D1D2D3D4D5A1A2A3A4
Figure S7:
AUC over time for each method . We measured the performance of each rankingmethod by computing ROC (Receiver Operating Characteristics) curves and AUC (area underthecurve) values. True positives were obtained from ClinicalTrials.gov by retrieving the list of drugscurrently undergoing clinical trials for COVID-19 (Table S5). Here, we quantified the performanceof models considering only the evidence provided up to a certain date.. We measured the performance of each rankingmethod by computing ROC (Receiver Operating Characteristics) curves and AUC (area underthecurve) values. True positives were obtained from ClinicalTrials.gov by retrieving the list of drugscurrently undergoing clinical trials for COVID-19 (Table S5). Here, we quantified the performanceof models considering only the evidence provided up to a certain date.