[PDF] Benchmarking Deep Graph Generative Models for Optimizing New Drug Molecules for COVID-19

Abstract

Design of new drug compounds with target properties is a key area of research in generative modeling. We present a small drug molecule design pipeline based on graph-generative models and a comparison study of two state-of-the-art graph generative models for designing COVID-19 targeted drug candidates: 1) a variational autoencoder-based approach (VAE) that uses prior knowledge of molecules that have been shown to be effective for earlier coronavirus treatments and 2) a deep Q-learning method (DQN) that generates optimized molecules without any proximity constraints. We evaluate the novelty of the automated molecule generation approaches by validating the candidate molecules with drug-protein binding affinity models. The VAE method produced two novel molecules with similar structures to the antiretroviral protease inhibitor Indinavir that show potential binding affinity for the SARS-CoV-2 protein target 3-chymotrypsin-like protease (3CL-protease).

Full PDF

BBenchmarking Deep Graph Generative Models for OptimizingNew Drug Molecules for COVID-19

Logan Ward

Argonne National [email protected]

Jenna A. Bilbrey

Pacific Northwest [email protected]

Sutanay Choudhury

Pacific Northwest [email protected]

Neeraj Kumar

Pacific Northwest [email protected]

Ganesh Sivaraman

Argonne National [email protected]

ABSTRACT

Design of new drug compounds with target properties is a keyarea of research in generative modeling. We present a small drugmolecule design pipeline based on graph-generative models anda comparison study of two state-of-the-art graph generative mod-els for designing COVID-19 targeted drug candidates: 1) a varia-tional autoencoder-based approach (VAE) that uses prior knowl-edge of molecules that have been shown to be effective for earliercoronavirus treatments and 2) a deep Q-learning method (DQN)that generates optimized molecules without any proximity con-straints. We evaluate the novelty of the automated molecule gen-eration approaches by validating the candidate molecules withdrug-protein binding affinity models. The VAE method producedtwo novel molecules with similar structures to the antiretroviralprotease inhibitor Indinavir that show potential binding affinityfor the SARS-CoV-2 protein target 3-chymotrypsin-like protease(3CL-protease).

Deep learning has demonstrated the potential to revolutionize drugdesign by reducing the initial search space in the early stages ofdiscovery [25, 33, 49, 51]. By applying the appropriate algorithmtrained on the appropriate data, novel molecules can be generatedwith target properties [13, 24, 29, 43]. Here, we evaluate meth-ods for the high-performance intelligent design of small-moleculedrug candidates with anti-SARS activity, with a specific focus onSARS-CoV-2, otherwise known as COVID-19. However, discover-ing potential lead candidates for COVID-19 presents a challengeto the scientific community due to the long timescale of the drugdevelopment process. There is a need to accelerate the design pro-cess through AI-driven workflows for effective drug compounddevelopment.The potential drug space is composed of over 10 compounds,and candidates with suitable activity against specific proteins onlynarrows the search space to 10 − structures. Machine learning(ML) methods are actively used in this search-and-screen process.Candidate molecules generated by ML methods are passed to down-stream verification via virtual high-throughput drug-protein bind-ing techniques, drug synthesis, biological assay, and finally clinicaltrials [5, 16, 44]. Heterogeneous graphs provide a natural representation for small-molecule organic compounds, with nodes representing atoms inthe molecular structure and edges representing bonds betweenthe atoms [48]. This approach motivated the exploration of graph-generative models such as graph convolutional policy networks[49], variational autoencoders [26, 27, 40], and variants of deepreinforcement learning [45, 52] for the target-driven optimizationof drug molecules. The drug-molecule design task is defined asgenerating a set of graphs 𝐺 𝑜𝑝𝑡 such that for each graph 𝑔 ∈ 𝐺 𝑜𝑝𝑡 , 𝑓 𝑜𝑝𝑡 ( 𝑔 ) ≥ 𝛿 , where 𝑓 𝑜𝑝𝑡 is a property optimization function and 𝛿 isa user-specified threshold. Most methods optimize target moleculesfor properties that can be derived from the molecular structure, suchas the octanol-water partition coefficient (logP) [2] and quantitativeestimate of druglikeness (QED) [7].The focus of our work is two-fold to generate compounds fordrug discovery, specifically for SARS-CoV-2. In the first similarityapproach , a junction-tree based variational autoencoder (JT-VAE)[27] is trained on a database of known drug molecules that has pIC activity [15]. The trained VAE is used to generate novel moleculesoptimized towards specific properties using Bayesian optimization(BO). The second approach extends a graph-based deep reinforce-ment learning (DQN) method [52] to generate molecules that arenot constrained by their proximity to past anti-SARS compounds.For comparison, we use the same set of property scoring func-tions ( 𝑓 𝑜𝑝𝑡 ) as the respective optimization and reward functions forbenchmarking the JT-VAE and DQN approaches. Contributions.

1) The goal of our study is to perform multi-objective optimization for generating molecular structures by con-sidering critical bioactivity properties that are typically not con-sidered in structure-activity relationship approaches. Specifically,we focus on pIC [42], which captures the potency of the drugtowards a protease target and cannot be accurately estimated fromthe chemical structure [3]. 2) In this context, we assembled a newprotease dataset with molecules that are active against various en-zymatic assays. This is considered to be one of the key propertieswhile generating new drug molecules. These molecules are filteredfrom experimental pharmacology data in CheMBL, BindingDB, andToxCat [6]. 3) Finally, we validate all top-ranking molecules againsta Drug Target Binding Affinity (DBTA) classifier [25] to asses poten-tial anti-SARS-CoV-2 activity. Full details of our implementation andsource code are available at https://github.com/exalearn/covid-drug-design . a r X i v : . [ c s . L G ] F e b igure 1: Depiction of workflow developed for this work. The anti-SARS database is used to train a MPNN to predict pIC and the JT-VAE model. The trained MPNN is used as the scoring function in both JT-VAE (top row) and DQN-based moleculargeneration (bottom). Candidate molecules are screened by pIC (>8) and validated by a Drug Target Binding Affinity classifier. In response to the COVID-19 global pandemic, researchers havepushed to identify marketed drugs that can be repurposed for SARS-CoV-2 treatment [18, 20, 35, 44, 51]. Born et al. amended their Pac-cMann RL approach, originally intended to generate anticancerdrugs, to generate molecules with affinity to viral target proteinsand controlled toxicity [10]. Batra et al. applied an ML-based screen-ing approach to find known compounds with binding affinity toeither the isolated SARS-CoV-2 S-protein at its host receptor regionor to the S-protein-human ACE2 interface complex [5]. Huang et al.developed a deep learning toolkit for drug repurposing, called Deep-Purpose, with the goal of recommending candidates with high bind-ing affinity to target amino acid sequences from known drugs [25].These approaches screen large databases of known compounds,which have the potential to miss novel molecules with anti-SARSactivity. Chenthamarakshan et al. developed the generative mod-eling framework CogMol to design drug candidates specific to agiven target protein [13].Here we discuss which properties should be considered duringlead optimization [14]. LogP is a measure of lipophilicity, whichprovides an understanding of the behavior of a drug in the body.Compounds intended for oral administration should have a logPno greater than 5, according to Lipinski’s Rule of 5 [34]. Furtheranalysis has shown that logP values between 1 and 3 may be moreappropriate considering the effect of logP on absorption, distribu-tion, metabolism, elimination, and toxicology (ADMET) properties[47]. Though oral bioavailability is an important factor, a sole focuson logP has the potential to screen out otherwise useful compounds[50]. QED has been proposed as a more holistic druglikeness met-ric [8], from 0 (low) to 1 (high). Druglikeness provides a general metric for ADMET properties, but does not indicate the activityor effectiveness of a drug towards a specific target. The half max-imal inhibitory concentration (IC ), on the other hand, providesa quantitative measure of the potency of a compound to inhibit aspecific biological process. IC is obtained by measurement, and nouniversal ab initio method of computing its value exists. A numberof methods have been developed to approximate IC , many basedon QSPR and recently some based on machine learning [1, 3, 9, 38].Similarity to known drugs is also an important factor in drug dis-covery [30, 36], as is the ability to synthesize the molecule, whichcan be estimated by the synthetic accessibility (SA) score [from 1(easy) to 10 (difficult)] [17]. Prediction

We trained a message-passing neural network (MPNN) [19, 41]to predict pIC (the inverse log of IC ) for a given molecularstructure. Following the formalism of Gilmer et al. [19], our net-work is composed of message, update and readout operations (eqns.1-3) and our choices for these functions are based on networksdeveloped by St. John et al. for polymer property prediction. [28]The original state of each atom ( ℎ 𝑣 ) and bond ( 𝛼 𝑣𝑤 ) in our mole-cule ( 𝐺 ) is a 256-length vector with values defined by an embeddingtable based on the atomic number and bond type (e.g., single, dou-ble, aromatic). The states of these atoms are modified by eightsuccessive “message” layers. Each message layer uses a two-layermulti-layer perceptron (MLP) with sigmoid activations to computea message that uses the state of an atom ( ℎ 𝑣 ), the state of the neigh-boring atom ( ℎ 𝑤 ) and the bond which joins them ( 𝛼 𝑣𝑤 ). The atomand bond states are updated according to the following equations: 𝑡 + 𝑣 = ∑︁ 𝑤 ∈ 𝑁𝑒𝑖𝑔ℎ𝑏𝑜𝑟𝑠 ( 𝑣 ) 𝑀 𝑡 ( ℎ 𝑡𝑣 , ℎ 𝑡𝑤 , 𝛼 𝑡𝑣𝑤 ) (1) ℎ 𝑡 + 𝑣 = ℎ 𝑡𝑣 + 𝑚 𝑡 + 𝑣 (2) 𝛼 𝑡 + 𝑣𝑤 = 𝛼 𝑡𝑣𝑤 + 𝑀 𝑡 ( ℎ 𝑡𝑣 , ℎ 𝑡𝑤 , 𝛼 𝑡𝑣𝑤 ) (3)The atom states output from the last layer ( ℎ 𝑇𝑣 ) are used to predictthe 𝑝𝐼𝐶 of the molecule using a "readout" function ( 𝑅 ).ˆ 𝑦 = 𝑅 ( ℎ 𝑇𝑣 | 𝑣 ∈ 𝐺 ) (4)We use several variants of the readout function in our study.We tested both “molecular fingerprints,” where the states of eachnode are combined before using a multi-layer perceptron (MLP) toreduce to compute pIC , and an “atomic contribution,” where wecombine the nodes after MLP to compute a per-node contributionto pIC . We experimented with the use of five different functionsto reduce the atomic state/contributions to a single value for eachgraph: summation, mean, maximum, softmax, and attention. Theattention functions are created by learning an attention map bypassing the node states through a MPNN. We tested all combina-tions of “molecular fingerprint” vs. “atomic contribution” and thefive readout functions, for a total of 10 networks, training eachon network the same 90% of our pIC dataset and comparing itsperformance on the withheld 10% of the data. We used an MPNNthat uses attention maps to reduce contributions from each atomto a single pIC of a molecule in all subsequent experiments. We use a junction-tree (JT) based variational autoencoder (VAE)method [27] for generating molecules with high proximity to anti-SARS drug molecules. This model generates novel molecular graphsby laying a tree-structured scaffold over substructures in the mole-cule, which are then combined into a valid molecule using a MPNN.The JT-VAE model allows for the incremental expansion of a molecu-lar graph through the addition of subgraphs, or “vocabulary" of validcomponents (denoted 𝑉 𝐷 ), derived from the training set (Figure 1).The subgraphs are used to encode a molecule into its vector repre-sentation and decode latent vectors into valid molecular graphs. Theuse of subgraphs, rather than building a molecule atom-by-atom,maintains chemical validity at each step, while also incorporatingfunctional groups common to the training set. Chemical graphsgenerated from the vocabulary will be structurally similar to thosein the training set, which is a benefit when attempting to designdrugs with similar properties to those in the training set.Given a molecular graph 𝐺 = ( 𝑉 , 𝐸 ) , JT-VAE coarsens 𝐺 intoa junction tree data structure T = ( 𝑉 𝑇 , 𝐸 𝑇 ) such that each node 𝑣 ∈ 𝑉 𝑇 in T represents a subgraph of 𝐺 . The subgraphs can becoarsened by using either automated JT construction algorithmsfrom the graphical model literature [32] or a vocabulary basedmapping approach that reduces to important building blocks ofchemical structures.Next, the JT-VAE method uses a message-passing network (as de-scribed in section 3.1) to encode T (Figure 1) into a vector represen-tation Z T . We refer the reader to [27] for specific implementationsof the MPNN-based encoders for the input graph and the junctiontree. The decoder component of the VAE learns to generate the same junction tree structure from Z T by maximizing the likelihoodfunction 𝑝 (T | Z T ) .Once the VAE model is trained, the next phase involves a two-step process for generating a molecular graph structure that isoptimized for a target property. The first step involves drawing asample from the latent space learnt by the VAE and transformingthe sampled vector representation into a corresponding junctiontree structure. The second step pursues a Bayesian optimization(BO) strategy to map the junction tree to a molecular graph thatmaximizes the target properties. Using the JT-VAE trained on ourSARS-related database, we perform Bayesian optimization (BO),using the method of Kusner et al. [31] to produce novel moleculeswith target properties described in section 3.4. We follow the Q-learning approach of Zhou et al. for our deep rein-forcement learning approach [52]. We consider tasks in which anagent interacts with an environment E represented as a moleculargraph. The agent starts with a null graph. At each time-step theagent selects an action 𝑎 𝑡 from an action space A that includesaddition of single atoms, changing the type of bonds or remov-ing bonds from the graph. The agent also receives a reward R ateach time step depending on a property scoring function. All theproperty scoring functions are described in section 3.4.In this setting, we cast the molecule generation problem as aMarkov Decision Process (MDP) [37] to learn a policy network 𝜋 that determines the best sequence of actions that start with an initialmolecular graph and transform it a larger graph with desirableproperties in a step-by-step fashion (Figure 1). At each step, weenumerate all possible actions and then select those which producevalid molecules (e.g., respect valency rules). Next, we train a multi-layer perceptron (MLP) to predict to predict the value [37] of acertain action by passing the Morgan fingerprints [39] as input.The MLP approximates the value of an action computed using theBellman equation, where the score of a state and the maximumscore of the subsequent state is multiplied by a decay factor. Asestablished with other Deep Q-Learning approaches, the additionof the value of the next state increases the value of moves whichwill lead to higher future rewards. The scoring functions described in this section are used for bothBayesian optimization in the VAE based approach and reward com-putation for the deep reinforcement-learning based approach. Fol-lowing Jin et al. [27], we first compute a score that penalizes logPby the SA score (recall from section 2 that higher SA values arediscouraged) and number of cycles with more than 6 atoms (eqn. 5).Considering that QED is a more comprehensive heuristic than logP,we also use a similar scoring function composed of QED penalizedby the SA score and number of long cycles (eqn. 6). We then ex-amine the utility of a SARS-specific scoring function based on thepIC predicted by our MPNN penalized by the SA score and num-ber of long cycles. Finally, we examine a multi-objective scoringfunction that accommodates both pIC and penalized QED. 𝑙𝑜𝑔𝑃 𝑃 ( 𝑚 ) = 𝑙𝑜𝑔𝑃 ( 𝑚 ) − 𝑆𝐴 ( 𝑚 ) − 𝑐𝑦𝑐𝑙𝑒 ( 𝑚 ) (5) 𝐸𝐷 𝑃 ( 𝑚 ) = 𝑄𝐸𝐷 ( 𝑚 ) − 𝑆𝐴 ( 𝑚 ) − 𝑐𝑦𝑐𝑙𝑒 ( 𝑚 ) (6) 𝑝𝐼𝐶 ( 𝑚 ) = 𝑅 ( 𝑔𝑟𝑎𝑝ℎ ( 𝑚 )) (from eqn. 4) (7) 𝑝𝐼𝐶 + 𝑄𝐸𝐷 𝑃 ( 𝑚 ) = 𝑝𝐼𝐶 ( 𝑚 ) + 𝑄𝐸𝐷 ( 𝑚 ) − 𝑆𝐴 ( 𝑚 ) − 𝑐𝑦𝑐𝑙𝑒 ( 𝑚 ) (8) We conduct experiments to answer two questions:

Q1:

What arethe best possible ways to generate molecules with targeted activitytowards SARS-CoV-2?

Q2:

How do we evaluate the novelty of ourgenerated molecules?

We assembled a dataset with molecules active against various enzy-matic assays filtered from experimental pharmacology data loggedin CheMBL, BindingDB, and ToxCat [6]. Details of this dataset areprovided in section 5.1. Following the workflow in Figure 1, we firsttrain a pIC model, and subsequently use it to train the JT-VAEand DQN models. We begin with a discussion of the pIC modelgiven its key role in training the JT-VAE and DQN models. surrogate model We tested multiple variants of the pIC prediction model. We foundthat MPNNs that limited the computation of the pIC to contri-butions from only a few specific atoms in the molecule performedthe best. As shown in Table S1, the models which use max and soft-max functions for aggregating the atom-level representations to amolecule-level representation (equation 4) have higher 𝑅 scoresthan those which use summation or mean outputs.The relative performance of different networks can be explainedby the physical mechanism behind the performance of anti-viraldrugs. The presence or absence of a specific pattern in the molecularstructure (e.g., functional groups, substituents) controls whetherthe molecule will bind with a certain portion of a target virus pro-tein. We hypothesize that the “maximum” function, in particular,models this “all-or-nothing” physics well. The other atoms in themolecule play a role in determining whether the molecule will stayaffixed at the target site. The reduced but non-negligible effect ofthe side groups could explain why the “atomic”-contribution model,which only uses the contribution from a single atom, performsless well than the whole-molecule fingerprint. These trends giveus confidence that the MPNNs are operating based on well-knownphysics, but we would require comparison of which atoms are con-tributing to the predicted pIC to results from molecular bindingsimulations to determine if the networks are indeed interpretable. Molecule generation setup using JT-VAE.

We trained JT-VAEfor 8,300 iterations on the full database, with the following hyper-parameters: hidden state dimension of 450, latent code dimensionof 56, and graph message passing depth of 3. Analysis of the trainedJT-VAE is given in Figure S1. To optimize towards the specifiedscoring functions, we trained a sparse Gaussian process (SGP) topredict a score given the latent representation learned by JT-VAEand then perform 10 iterations of batched BO (sampling = 50) basedon the expected improvement.

QED p I C RLJT-VAE

Figure 2: QED versus pIC for molecules generated fromDQN-based RL and JT-VAE with all scoring functions.Molecule generation setup using DQN. Our DQN approachbuilds up molecules from a single atom to large molecules atom-by-atom and bond-by-bond. Each “episode” starts with a blank slateand the RL agent is allowed up to 40 steps. We update a model ofthe Q-function to predict the value of each move after each stepin each episode and, as this model improves during training, wesmoothly turn down the probability that we will choose a randommove over the prediction of this model.The DQN finds tens of thousands of candidate molecules withhigh pIC , as shown in Figure 2 and Table 1. Using the multi-objective reward function led to fewer molecules with pIC > > > .

5. In contrast, only 5% of the molecules in the pIC -based search show QED > 0.5. The addition of QED reduces the totalnumber of high-pIC molecules found by 25%, but increases thenumber of high-pIC molecules found by over 10 times. Therefore,we recommend incorporating synthesizability and/or druglikenessinto RL-driven searches for drug-like molecules. Comparing JT-VAE and DQN.

Table 1 shows the three highest-scoring molecules generated by the two generative models witheach scoring function. The DQN models always outperform JT-VAE in finding a molecule with a superior value of the scoringfunction being optimized. The performance disparity is particularlyapparent when optimizing for logP: the maximum logP from DQNis 12.6 compared to only 4.1 for JT-VAE. We attribute the differencein optimization performance to JT-VAE implicitly sampling froma distribution of drug-like molecules and DQN having no suchconstraints.The candidate molecules generated by JT-VAE have consistentlybetter druglikeness and SA scores even when those values are notexplicitly optimized for. When optimizing towards logP, the top-3molecules generated by JT-VAE have moderate-to-high QEDs, whilethe top-3 from DQN are below 0.11. We attribute the exceptionallylarge disparity in optimal logP and associated QED values betweenthe two methods to the fact that drugs typically have logP valuesbetween -0.4 and 5.6. The molecules from which JT-VAE was trainedwere all drug-like molecules, which makes it improbable to samplemolecules with logP values far outside that range. Similarly, the able 1: Properties of the three highest-scoring molecules generated by each model using the specified scoring function. Scoring function pIC QED logP SA Score1st 2nd 3rd 1st 2nd 3rd 1st 2nd 3rd 1st 2nd 3rdlogP 𝑃 (JT-VAE) 4 .

93 4 .

57 4 .

60 0 .

45 0 .

78 0 .

71 4 .

05 4 .

27 4 .

20 1 .

70 2 .

08 2 . 𝑃 (DQN) 6 .

10 8 .

17 4 .

98 0 .

04 0 .

07 0 .

11 13 .

86 12 .

64 12 .

52 3 .

59 2 .

97 2 . 𝑃 (JT-VAE) 4 .

15 4 .

23 4 .

71 0 .

91 0 .

84 0 .

91 3 .

72 2 .

19 2 .

20 1 .

80 1 .

71 2 . 𝑃 (DQN) 6 .

80 6 .

80 0 .

77 0 .

77 3 .

46 3 .

46 2 .

05 2 .

05 2 . (JT-VAE) 10 .

22 10 .

07 10 .

07 0 .

15 0 .

12 0 .

12 4 .

86 3 .

80 3 .

06 4 .

52 4 .

94 4 . (DQN) 10 .

57 10 .

56 10 .

39 0 .

09 0 .

11 0 .

41 1 .

51 2 . − .

44 6 .

90 6 .

71 5 . +QED 𝑃 (JT-VAE) 8 .

58 5 .

98 8 .

18 0 .

83 0 .

87 0 .

70 4 .

02 3 .

37 3 .

82 1 .

93 1 .

74 1 . +QED 𝑃 (DQN) 10 .

27 10 .

27 0 .

80 0 .

80 3 .

02 3 .

02 2 .

90 2 .

90 2 . molecules in the JT-VAE training set were experimentally synthe-sized. The SA score for these molecules is low, which could explainwhy the optimized molecules from the JT-VAE are also low evenwhen this property was not optimized for. The RL agent uses no in-formation about the space of experimentally studied drug moleculesduring its training process and, accordingly, finds molecules farfrom it.Overall, we find two different purposes for JT-VAE and RL-basedmolecular optimization. JT-VAE implicitly uses the distribution ofmolecules in its training set to bias towards realistic molecules,albeit at the expense of finding better candidates. The RL-basedapproach lacks such constraints and, for better or worse, can op-timize without even implicitly regarding synthesizability or anyother characteristic not explicitly encoded in the scoring function. We observed an interesting structural trend in the molecules gener-ated by JT-VAE when using pIC as the scoring function. Figure3 shows the structures of both the molecules with pIC > 8 andthe anti-HIV drug Indinavir, an antiretroviral protease inhibitor.A common backbone is shared between Indinavir and the top-6predictions. The Tanimoto similarity scores [4] of these six gen-erated molecules against Indinavir range from 0.65 to 0.91. Indi-navir has been proposed as a drug to treat SARS-CoV-2 due tofavorable docking to the coronavirus 3-chymotrypsin-like protease(3CL-protease), a promising drug target for combating coronavirusinfections [11, 21, 22]. Notably, three of the generated moleculeshave a higher predicted pIC than does Indinavir.Experimental confirmation of Drug-Target-Interaction (DTI) ischallenging and time-consuming [46]. In silico

Drug-Target BindingAffinity (DTBA) methods offer an alternative to evaluate DTI [23].We employ a ML-based DTBA model to validate the interactionof molecules generated by JT-VAE against a target SARS-CoV2-Lprotease [12]. We trained a DBTA binary classification model us-ing extended connectivity fingerprint [39] encoding for the drugmolecule and the target protease sequence encoding using a Convo-lution Neural Network (CNN) as implemented in the DeepPurposetoolkit [25]. The default hyperparameters provided in the DeepPur-pose toolkit were found to be sufficient. The DBTA model classified

Indinavir antiretroviral protease inhibitor pIC QED logP SA

Figure 3: The top-6 molecules generated by the JT-VAEmethod optimized using the pIC scoring function. four of the top 11 molecules (including the top two in Figure 3)with probability > We compared two graph generative models, JT-VAE and DQN, forthe task of discovering potential small-molecule candidates withactivity against SARS-CoV-2. DQN always outperformed JT-VAEin finding a molecule with a superior value of the scoring functionbeing optimized. However, JT-VAE generated molecules that weremore structurally similar to those in the database due to substruc-ture representation, which produced a lower SA score and logP < 5.JT-VAE tended to produce what looked to be drug molecules, whileDQN produced precursor-like candidates with optimized proper-ties, which could be used as starting structures to add additionalsubstituents aimed at the specific target. EFERENCES [1] P. Armutlu, M. E. Ozdemir, F. Uney-Yuksektepe, I. H. Kavakli, and M. Turkay.2008. Classification of drug molecules considering their IC50 values using mixed-integer linear programming based hyper-boxes method.

BMC Bioinformatics

Pharmaceutical Research

15, 2 (1998), 209–215.[3] A. Bag and P. K. Ghorai. 2016. Development of Quantum Chemical Method toCalculate Half Maximal Inhibitory Concentration (IC50).

Mol Inform

35, 5 (2016),199–206. https://doi.org/10.1002/minf.201501004[4] Dávid Bajusz, Anita Rácz, and Károly Héberger. 2015. Why is Tanimoto indexan appropriate choice for fingerprint-based similarity calculations?

Journal ofcheminformatics

7, 1 (2015), 20.[5] Rohit Batra, Henry Chan, Ganesh Kamath, Rampi Ramprasad, Mathew J.Cherukara, and Subramanian Sankaranarayanan. 2020. Screening of TherapeuticAgents for COVID-19 using Machine Learning and Ensemble Docking Simula-tions. arXiv e-prints , Article arXiv:2004.03766 (April 2020), arXiv:2004.03766 pages.arXiv:q-bio.BM/2004.03766[6] A Patrícia Bento, Anna Gaulton, Anne Hersey, Louisa J Bellis, Jon Chambers,Mark Davies, Felix A Krüger, Yvonne Light, Lora Mak, Shaun McGlinchey, et al.2014. The ChEMBL bioactivity database: an update.

Nucleic Acids Research

NatureChemistry

4, 2 (2012), 90.[8] G. Richard Bickerton, Gaia V. Paolini, Jérémy Besnard, Sorel Muresan, and An-drew L. Hopkins. 2012. Quantifying the chemical beauty of drugs.

Naturechemistry

4, 2 (2012), 90–98. https://doi.org/10.1038/nchem.1243[9] Esben Jannik Bjerrum. 2017. SMILES Enumeration as Data Augmentationfor Neural Network Modeling of Molecules.

CoRR abs/1703.07076 (2017).arXiv:1703.07076 http://arxiv.org/abs/1703.07076[10] Jannis Born, Matteo Manica, Joris Cadow, Greta Markert, Nil Adell Mill, ModestasFilipavicius, and María Rodríguez Martínez. [n.d.]. PaccMannRL on SARS-CoV-2:Designing antiviral candidates with conditional generative models. ([n. d.]).[11] Y. et al. Chang. 2020. Potential Therapeutic Agents for COVID-19 Based on theAnalysis of Protease and RNA Polymerase Docking. (2020), 2020020242.https://doi.org/10.20944/preprints202002.0242.v1[12] Yu Wai Chen, Chin-Pang Bennu Yiu, and Kwok-Yin Wong. 2020. Predictionof the SARS-CoV-2 (2019-nCoV) 3C-like protease (3CL pro) structure: virtualscreening reveals velpatasvir, ledipasvir, and other drug repurposing candidates.

F1000Research arXivpreprint arXiv:2004.01215 (2020).[14] Robert A Copeland, David L Pompliano, and Thomas D Meek. 2006. Drug–targetresidence time and its implications for lead optimization.

Nature Reviews DrugDiscovery

5, 9 (2006), 730–739.[15] Emmie de Wit, Neeltje van Doremalen, Darryl Falzarano, and Vincent J Munster.2016. SARS and MERS: recent insights into emerging coronaviruses.

NatureReviews Microbiology

14, 8 (2016), 523.[16] Ron O Dror, Robert M Dirks, JP Grossman, Huafeng Xu, and David E Shaw. 2012.Biomolecular simulation: a computational microscope for molecular biology.

Annual Review of Biophysics

41 (2012), 429–452.[17] Peter Ertl and Ansgar Schuffenhauer. 2009. Estimation of synthetic accessibilityscore of drug-like molecules based on molecular complexity and fragment contri-butions.

Journal of Cheminformatics

1, 1 (2009), 8. https://doi.org/10.1186/1758-2946-1-8[18] Hua-Hao Fan, Li-Qin Wang, Wen-Li Liu, Xiao-Ping An, Zhen-Dong Liu, Xiao-QiHe, Li-Hua Song, and Yi-Gang Tong. 2020. Repurposing of clinically approveddrugs for treatment of coronavirus disease 2019 in a 2019-novel coronavirus-related coronavirus model.

Chinese medical journal . JMLR. org, 1263–1272.[20] David E. Gordon, Gwendolyn M. Jang, Mehdi Bouhaddou, Jiewei Xu, KirstenObernier, Kris M. White, Matthew J. O’Meara, Veronica V. Rezelj, Jeffrey Z. Guo,Danielle L. Swaney, Tia A. Tummino, Ruth Huettenhain, Robyn M. Kaake, Ali-cia L. Richards, Beril Tutuncuoglu, Helene Foussard, Jyoti Batra, Kelsey Haas,Maya Modak, Minkyu Kim, Paige Haas, Benjamin J. Polacco, Hannes Braberg,Jacqueline M. Fabius, Manon Eckhardt, Margaret Soucheray, Melanie J. Bennett,Merve Cakir, Michael J. McGregor, Qiongyu Li, Bjoern Meyer, Ferdinand Roesch,Thomas Vallet, Alice Mac Kain, Lisa Miorin, Elena Moreno, Zun Zar Chi Naing,Yuan Zhou, Shiming Peng, Ying Shi, Ziyang Zhang, Wenqi Shen, Ilsa T. Kirby, James E. Melnyk, John S. Chorba, Kevin Lou, Shizhong A. Dai, Inigo Barrio-Hernandez, Danish Memon, Claudia Hernandez-Armenta, Jiankun Lyu, Christo-pher J. P. Mathy, Tina Perica, Kala B. Pilla, Sai J. Ganesan, Daniel J. Saltzberg,Ramachandran Rakesh, Xi Liu, Sara B. Rosenthal, Lorenzo Calviello, SrivatsVenkataramanan, Jose Liboy-Lugo, Yizhu Lin, Xi-Ping Huang, YongFeng Liu,Stephanie A. Wankowicz, Markus Bohn, Maliheh Safari, Fatima S. Ugur, Cassan-dra Koh, Nastaran Sadat Savar, Quang Dinh Tran, Djoshkun Shengjuler, Sabrina J.Fletcher, Michael C. O’Neal, Yiming Cai, Jason C. J. Chang, David J. Broadhurst,Saker Klippsten, Phillip P. Sharp, Nicole A. Wenzell, Duygu Kuzuoglu, Hao-Yuan Wang, Raphael Trenker, Janet M. Young, Devin A. Cavero, Joseph Hiatt,Theodore L. Roth, Ujjwal Rathore, Advait Subramanian, Julia Noack, MathieuHubert, Robert M. Stroud, Alan D. Frankel, Oren S. Rosenberg, Kliment A. Verba,David A. Agard, Melanie Ott, Michael Emerman, Natalia Jura, et al. 2020. A SARS-CoV-2 protein interaction map reveals targets for drug repurposing.

Nature (2020). https://doi.org/10.1038/s41586-020-2286-9[21] Gideon A. Gyebi, Olalekan B. Ogunro, Adegbenro P. Adegunloye, Oludare M.Ogunyemi, and Saheed O. Afolabi. 2020. Potential inhibitors of coronavirus3-chymotrypsin-like protease (3CL(pro)): an in silico screening of alkaloids andterpenoids from African medicinal plants.

Journal of Biomolecular Structure andDynamics (2020), 1–13. https://doi.org/10.1080/07391102.2020.1764868[22] C Harrison. 2020. Coronavirus puts drug repurposing on the fast track.

NatureBiotechnology

38, 4 (2020), 379.[23] Tong He, Marten Heidemeyer, Fuqiang Ban, Artem Cherkasov, and Martin Ester.2017. SimBoost: a read-across approach for predicting drug–target bindingaffinities using gradient boosting machines.

Journal of Cheminformatics

9, 1(2017), 1–14.[24] Julien Horwood and Emmanuel Noutahi. 2020. Molecular Design in SyntheticallyAccessible Chemical Space via Deep Reinforcement Learning. arXiv preprintarXiv:2004.14308 (2020).[25] Kexin Huang, Tianfan Fu, Cao Xiao, Lucas Glass, and Jimeng Sun. 2020. Deep-Purpose: a Deep Learning Based Drug Repurposing Toolkit. arXiv preprintarXiv:2004.08919 (2020).[26] Wengong Jin, Regina Barzilay, and Tommi Jaakkola. 2020. Multi-ObjectiveMolecule Generation using Interpretable Substructures. arXiv e-prints , ArticlearXiv:2002.03244 (Feb. 2020), arXiv:2002.03244 pages. arXiv:cs.LG/2002.03244[27] Wengong Jin, Regina Barzilay, and Tommi S. Jaakkola. 2018. Junction TreeVariational Autoencoder for Molecular Graph Generation.

CoRR abs/1802.04364(2018). arXiv:1802.04364 http://arxiv.org/abs/1802.04364[28] Peter C. St. John, Caleb Phillips, Travis W. Kemper, A. Nolan Wilson, Yanfei Guan,Michael F. Crowley, Mark R. Nimlos, and Ross E. Larsen. 2019. Message-passingneural networks for high-throughput polymer screening.

Journal of ChemicalPhysics bioRxiv (2020).[30] Ashutosh Kumar and Kam Y. J. Zhang. 2018. Advances in the Development ofShape Similarity Methods and Their Application in Drug Discovery.

Frontiers inchemistry

Journal of the Royal Statistical Society: Series B (Methodological)

50, 2 (1988),157–194.[33] Yibo Li, Liangren Zhang, and Zhenming Liu. 2018. Multi-objective de novo drugdesign with conditional graph generative model.

Journal of Cheminformatics

Drug Discovery Today: Technologies

1, 4 (2004), 337–341.[35] Hongzhou Lu. 2020. Drug treatment options for the 2019-new coronavirus (2019-nCoV).

BioScience Trends

14, 1 (2020), 69–71. https://doi.org/10.5582/bst.2020.01020[36] Gerald Maggiora, Martin Vogt, Dagmar Stumpfe, and Jurgen Bajorath. 2014.Molecular similarity in medicinal chemistry: miniperspective.

Journal of medici-nal chemistry

57, 8 (2014), 3186–3204.[37] Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, IoannisAntonoglou, Daan Wierstra, and Martin Riedmiller. 2013. Playing atari with deepreinforcement learning. arXiv preprint arXiv:1312.5602 (2013).[38] S. J. Patankar and P. C. Jurs. 2000. Prediction of IC50 Values for ACAT Inhibitorsfrom Molecular Structure.

Journal of Chemical Information and Computer Sciences

40, 3 (2000), 706–723. https://doi.org/10.1021/ci990125r[39] David Rogers and Mathew Hahn. 2010. Extended-connectivity fingerprints.

Journal of Chemical Information and Modeling

50, 5 (2010), 742–754.[40] Bidisha Samanta, DE Abir, Gourhari Jana, Pratim Kumar Chattaraj, Niloy Gan-guly, and Manuel Gomez Rodriguez. 2019. Nevae: A deep generative model for olecular graphs. In AAAI Conference on Artificial Intelligence , Vol. 33. 1110–1117.[41] Franco Scarselli, Marco Gori, Ah Chung Tsoi, Markus Hagenbuchner, and GabrieleMonfardini. 2008. The graph neural network model.

IEEE Transactions on NeuralNetworks

20, 1 (2008), 61–80.[42] JL Sebaugh. 2011. Guidelines for accurate EC50/IC50 estimation.

Pharmaceuticalstatistics

10, 2 (2011), 128–134.[43] Ganesh Sivaraman, Nicholas Jackson, Benjamin Sanchez-Lengeling, AlvaroVasquez-Mayagoitia, Alan Aspuru-Guzik, Venkatram Vishwanath, and Juan dePablo. 2020. A machine learning workflow for molecular analysis: application tomelting points.

Machine Learning: Science and Technology (2020).[44] Micholas Smith and Jeremy C. Smith. 2020. Repurposing Therapeutics for COVID-19: Supercomputer-Based Docking to the SARS-CoV-2 Viral Spike Protein andViral Spike Protein-Human ACE2 Interface. https://doi.org/10.26434/chemrxiv.11871402.v3[45] Niclas Ståhl, Göran Falkman, Alexander Karlsson, Gunnar Mathiason, and JonasBostrom. 2019. Deep reinforcement learning for multiparameter optimization inde novo drug design.

Journal of Chemical Information and Modeling

59, 7 (2019),3166–3176.[46] Maha Thafar, Arwa Bin Raies, Somayah Albaradei, Magbubah Essack, andVladimir B Bajic. 2019. Comparison Study of Computational Prediction Tools forDrug-Target Binding Affinities.

Frontiers in Chemistry

Expert Opinion onDrug Discovery

5, 3 (2010), 235–248. https://doi.org/10.1517/17460441003605098arXiv:https://doi.org/10.1517/17460441003605098 PMID: 22823020.[48] David Weininger. 1990. SMILES. 3. DEPICT. Graphical depiction of chemicalstructures.

Journal of Chemical Information and Computer Sciences

30, 3 (1990),237–243.[49] Jiaxuan You, Bowen Liu, Zhitao Ying, Vijay Pande, and Jure Leskovec. 2018. Graphconvolutional policy network for goal-directed molecular graph generation. In

Advances in Neural Information Processing Systems . 6410–6421.[50] Ming-Qiang Zhang and Barrie Wilkinson. 2007. Drug discovery beyond the‘rule-of-five’.

Current Opinion in Biotechnology

18, 6 (2007), 478 – 488. https://doi.org/10.1016/j.copbio.2007.10.005 Chemical biotechnology / Pharmaceuticalbiotechnology.[51] Yadi Zhou, Yuan Hou, Jiayu Shen, Yin Huang, William Martin, and FeixiongCheng. 2020. Network-based drug repurposing for novel coronavirus 2019-nCoV/SARS-CoV-2.

Cell Discovery

6, 1 (2020), 14. https://doi.org/10.1038/s41421-020-0153-3[52] Zhenpeng Zhou, Steven Kearnes, Li Li, Richard N. Zare, and Patrick Riley. 2019.Optimization of Molecules via Deep Reinforcement Learning.

Scientific Reports S1 UPPLEMENTARY MATERIAL5.1 Dataset Description

Preparation.

We prepared and assembled the protease datasets with molecules active against various protease in enzymatic assays filteredfrom experimentally pharmacology data such as CheMBL, BindingDB, and ToxCat [6]. The database was filtered out with the IC activitystandard types and their potency. Molecules with size larger than 1000 Dalton were removed due to the limitation of the representation oflarge molecules in cheminformatics. We also filtered out non-drug like molecules containing metals and polypeptides. The curated data wasstandardized using the logarithmic scale –log10 of a numeric value in nM for all compounds. We used the mean average for a molecule withmore than one IC value. The resulting dataset contains 6545 unique molecules accompanied by their SMILES strings and experimentalIC values. Quantitative Characterization.

We computed the metrics included in our scoring functions for each molecule in the database. Therange of values can be seen in Figure S1. The highest pIC is 10.89, while the lowest pIC is 1.22. The most common pIC is 4.0, which isshared by 320 structures, and the vast majority of structures (91.5%) have pIC values greater than 4.0. LogP values range from -10.36 to16.65 in a near Gaussian distribution with a mean of 3.70; 77.0% of all structures meet the requirement of Lipinski’s Rule of 5 that logP be nogreater than 5. QED values range from 0.01 to 0.94, while SA values range from 1.35 to 8.24, with 96.3% being below 5.We computed the Tanimoto similarity for all pairs of compounds to gain insight into the structural diversity of molecules in our database(Figure S2). The entries in the matrix were ordered in increasing pIC values. The similarity is represented by the color bar, with yellowrepresenting low similarity (0) and red high similarity (1). We observe that structures tend to become more similar to their neighbors aspIC increases, indicating that compounds with high pIC values tend to be structurally similar, supporting the consideration of molecularsimilarity during drug discovery. Figure S1: We generated 1,000 molecules using the trained JT-VAE, of which 560 were unique. The figure shows the comparisonof the QED, logP, SA score, and pIC of compounds in the database and those generated by the JT-VAE. The JT-VAE reproducedthe range of values present in the database, minus outliers. The similar values indicate that the JT-VAE is able to reproducethe wide range of structures present in the database. The pIC values for generated molecules were estimated by our MPNN. S2 igure S2: Heat map of the Tanimoto similarity between all compounds in the database. Entries are ordered in increasingpIC . The Tanimoto similarity is generally higher among structures with high pIC (located towards bottom right of thematrix), indicating the importance of considering structural similarity in drug discovery. S3 able S1: 𝑅 scores for MPNN models trained to predict the pIC of drugs from their molecular structure. Each model wastrained using a different readout function (columns) to combine atomic contributions to pIC or to create a single molecularfingerprint. Bold indicates the model used in our experiments and underscore indicates the best-performing model. atomic molecularreadoutattention C a n d i d a t e s F o un d MOpIC Figure S3: Performance of DQN agent using the multi-objective (MO) and pIC reward functions. The solid lines indicate thenumber of unique molecules with pIC > 8 found after a certain number of steps. The dashed lines indicate unique moleculeswith pIC > 8 and QED > 0.5 S4 igure S4: Comparison of the properties of molecules optimized with different scoring functions by JT-VAE method. The pIC value was estimated using our MPNN.value was estimated using our MPNN.