[PDF] ParaVS: A Simple, Fast, Efficient and Flexible Graph Neural Network Framework for Structure-Based Virtual Screening

Abstract

Structure-based virtual screening (SBVS) is a promising in silico technique that integrates computational methods into drug design. An extensively used method in SBVS is molecular docking. However, the docking process can hardly be computationally efficient and accurate simultaneously because classic mechanics scoring function is used to approximate, but hardly reach, the quantum mechanics precision in this method. In order to reduce the computational cost of the protein-ligand scoring process and use data driven approach to boost the scoring function accuracy, we introduce a docking-based SBVS method and, furthermore, a deep learning non-docking-based method that is able to avoid the computational cost of the docking process. Then, we try to integrate these two methods into an easy-to-use framework, ParaVS, that provides both choices for researchers. Graph neural network (GNN) is employed in ParaVS, and we explained how our in-house GNN works and how to model ligands and molecular targets. To verify our approaches, cross validation experiments are done on two datasets, an open dataset Directory of Useful Decoys: Enhanced (DUD.E) and an in-house proprietary dataset without computational generated artificial decoys (NoDecoy). On DUD.E we achieved a state-of-the-art AUC of 0.981 and a state-of-the-art enrichment factor at 2% of 36.2; on NoDecoy we achieved an AUC of 0.974. We further finish inference of an open database, Enamine REAL Database (RDB), that comprises over 1.36 billion molecules in 4050 core-hours using our ParaVS non-docking method (ParaVS-ND). The inference speed of ParaVS-ND is about 3.6e5 molecule / core-hour, while this number of a conventional docking-based method is around 20, which is about 16000 times faster. The experiments indicate that ParaVS is accurate, computationally efficient and can be generalized to different molecular.

Full PDF

PP ARA

VS: A S

IMPLE , F

AST , E

FFICIENT AND F LEXIBLE G RAPH N EURAL N ETWORK F RAMEWORK FOR S TRUCTURE -B ASED V IRTUAL S CREENING

A P

REPRINT

Junfeng Wu, Dawei Leng, Lurong Pan*

AIDD Group

Global Health Drug Discovery Institute, Beijing, China {junfeng.wu, lurong.pan}@ghddi.org A BSTRACT

Structure-based virtual screening (SBVS) is a promising in silico technique that integrates com-putational methods into drug design. An extensively used method in SBVS is molecular docking.However, the docking process can hardly be computationally efﬁcient and accurate simultaneouslybecause classic mechanics scoring function is used to approximate, but hardly reach, the quantummechanics precision in this method. In order to reduce the computational cost of the protein-ligandscoring process and use data driven approach to boost the scoring function accuracy, we introducea docking-based SBVS method and, furthermore, a deep learning non-docking-based method thatis able to avoid the computational cost of the docking process. Then, we try to integrate thesetwo methods into an easy-to-use framework, ParaVS, that provides both choices for researchers.Graph neural network (GNN) is employed in ParaVS, and we explained how our in-house GNN,HagNet, works and how to model ligands and molecular targets. To verify our approaches, crossvalidation experiments are done on two datasets, an open dataset Directory of Useful Decoys: En-hanced (DUD.E) and an in-house proprietary dataset without computational generated artiﬁcial de-coys (NoDecoy). On DUD.E we achieved a state-of-the-art AUC of . and a state-of-the-artenrichment factor at of 36.2; on NoDecoy we achieved an AUC of . . We further ﬁnishinference of an open database, Enamine REAL Database (RDB), that comprises over 1.36 billionmolecules in 4050 core-hours using our ParaVS non-docking method (ParaVS-ND). The inferencespeed of ParaVS-ND is about . e molecule / core − hour , while this number of a conventionaldocking-based method is around 20, which is about 16000 times faster. The experiments indicatethat ParaVS is accurate and computationally efﬁcient and can be generalized to different moleculartargets for virtual screening. Moreover, ParaVS is simple and ﬂexible since most AI componentscan be substituted or updated as per speciﬁc demand and future development. ParaVS-ND has beenreleased as a free online inference service in http://aidd.ghddi.org/sbvs/ and code will bereleased soon. Drug discovery process is a well known time-consuming and expensive task. The discovery of new drugs goes throughseveral development stages, each of which could give out thousands of millions of molecules for screening [16].Formerly, random screening and empirical advices were utilized for this task. Then, high-throughput screening (HTS)improves drug discovery process by offering automated and much quicker screening of large chemical libraries againsta molecular target [17]. However, HTS is still considered to be time-consuming, laborious, expensive with a relativelylow screening accuracy [6, 30].With the rapid development of computer science, computational methods are extensively used in drug design now.Speciﬁcally, virtual screening (VS) is one of the most promising in silico techniques that works as a ﬁlter, and structure-based virtual screening (SBVS) is a robust and powerful VS that predicts the interaction between a compound and a a r X i v : . [ q - b i o . B M ] F e b araVS: A Simple, Fast, Efﬁcient and Flexible Graph Neural Network Framework for Structure-Based VirtualScreening A P

REPRINT

Figure 1: Structure-Based Virtual Screening Frameworkmolecular target protein [5, 13]. SBVS utilizes the three-dimensional (3D) structure information of molecular tar-get, hence, it normally produces a better performance than ligand-based VS, which does not contain any structureinformation [1].A commonly used methodology in SBVS is molecular docking [8, 13, 20]. It is a procedure that inserts a ligand into aparticular region, as known as a pocket, of a molecular target protein. Most docking programs accomplish this task byperforming pocket searching, ligand insertion and pose determination using molecular mechanics (MM) and moleculardynamics (MD) and other heuristic algorithms [36]. Then, multiple possible results are output to ranking algorithmsto determine a ﬁnal docked target protein structure [9, 19, 20], namely a protein-ligand complex, thus, biological andchemical properties can be inferred for screening.Furthermore, deep learning (DL) becomes a prominent tool in VS. Comparing to traditional machine learning (ML)algorithms [6, 16], such as neural network [9], support vector machine (SVM) [19] and random forest (RF) [2], deeplearning models require minimum feature engineering. Ideally speaking, if enough data is provided, representationsor features can be learned directly from dataset without any human design [3, 4, 22]. An early successful DL exampleis Quantitative Structure-Activity Relationship (QSAR) [37], which operates over molecular ﬁngerprints. Wallach etal. [38] ﬁnd a way to apply 3D convolutional neural network (3D-CNN) to a protein-ligand complex which is repre-sented on a 3D grid. Ragoza et al. [29] further extended it to include active and inactive binding poses classiﬁcation.These 3D-CNN methods successfully outperform previous works and also improve the accuracy of predicting absolutebinding afﬁnities [2, 10, 39].Nevertheless, docking-based VS is not a perfect answer. The main problems of docking are: most docking processesare time and computing power consuming; docking results are not highly reliable; balanced and high-quality datais limited. In order to resolve them, non-docking methods that avoid using any docking programs are proposed.A common framework is to integrate representations of ligands and proteins into a single neural network withoutknowing 3D binding structures. Karimi et al. [18] introduced their DeepAffnity to predict the pIC50 of protein-ligand complexes with sequences of proteins and simpliﬁed molecular-input line-entry system(SMILES) [40] stringsof ligands as inputs. Gao et al. [12] developed a siamese network consists of a recurrent neural network (RNN) and agraph neural network (GNN), and they also brought attention mechanism [35] into their model to mitigate the problemof generalizability, which is a common problem in SBVS that the performance drops notably if the testing ligand andprotein are not seen in the training set. Lee et al. [23] also reported their models working on sequences of proteins andSMILES strings of ligands. Nguyen et al. [27] and Lim et al. [25] further developed GNN based models for SBVStasks. 2araVS: A Simple, Fast, Efﬁcient and Flexible Graph Neural Network Framework for Structure-Based VirtualScreening

A P

REPRINT

Description ParaVS-Dock ParaVS-NDtime (core-hour)

DUD.E docking e - DUD.E

LOO CV (10 epochs, GPU)

200 1600

RDB docking . e - RDB inference speed (sample / core-hour) inference speed

20 3 . e Table 1: RDB [31] contains . e compounds and DUD.E contains . e compounds. The docking process is doneby Autodock Vina1.1.2 [34], and all listed tasks are run by parallel processing clusters consist of 12

Intel(R) Xeon(R)Gold 5118 CPU @ 2.30GHz

CPU, except that the DUD.E leave-one-out(LOO) cross-validation(CV) experiment isdone by GPU for 10 epochs. We use core-hour to measure computational cost, which refers to the number of processingunits (cores) used to run a job multiplied by the duration in hours.In this paper, we propose a docking-based (ParaVS-Dock) and a non-docking-based (ParaVS-ND) method for SBVStasks, and establish a framework containing both of them, as illustrated in Figure 1. We evaluate both methodson two large datasets, an open dataset Directory of Useful Decoys: Enhanced (DUD.E) [26] and a Global HealthDrug Discovery Institute (GHDDI) in-house dataset NoDecoy. We also evaluate our methods on DUD, a subset ofDUD.E. On DUD.E, we achieved an AUC

P araV S − Dock = 0 . and a state-of-the-art AUC P araV S − ND = 0 . . OnNoDecoy dataset, our results are AUC P araV S − Dock = 0 . and AUC P araV S − ND = 0 . , proving the robustnessand generalizability of our methods. We did not do any deep learning ﬁne-tuning on all our experiments, whichimplies a potential of getting better performances. Besides, in order to demonstrate ParaVS computationally efﬁcient,we perform inference on a large database, Enamine REAL Database(RDB) [31], and successfully boost the inferencespeed more than 16000 times by circumventing the docking process.We summarize our contributions as follows:(i) We proposed a SBVS framework, ParaVS, with a docking-based (ParaVS-Dock) and a non-docking based(ParaVS-ND) method. ParaVS is highly optimizable as the AI components, shown as green blocks in Fig-ure 1, can easily be switched or updated.(ii) We evaluated our methods on two datasets, an open dataset and an in-house proprietary dataset, and bothacquired AUC mean > . , demonstrating their capability and generalizability.(iii) We investigated how to model a molecular target, and explained why non-docking-based methods are gener-ally better than docking-based ones.(iv) Our methods are of low computational cost. Table 1 lists the time we used for our experiments. By circum-venting the docking process, ParaVS-ND can be surprisingly faster than docking-based methods, especiallythe inference speed has boosted from 20 sample per CPU core-hour to . e .This paper is brieﬂy organized as: Section 2 explains GNN and our models in detail; Section 3 describes the datasetswe used, and how we perform our experiments; Section 4 presents the results of our methods and comparisons tomethods listed in other literatures; Section 5 summarizes our conclusions and also discusses several general problemsof SBVS. Our goal is to develop both a docking-based and a non-docking-based structure-based virtual screening(VS) method.Given this goal, a brief introduction to general GNN model and our in-house GNN model is provided. Then, ourSBVS methods are discussed comprehensively.

Graph neural network(GNN) is widely used in drug discovery process now, as ligands and proteins can be modeled asgraphs by nature [41]. A graph is a pair G = ( V, E ) , where V is a set of nodes and E is a set of edges, and an edgeis a set of paired nodes. GNNs model ligands and proteins as graphs, in which the nodes are atoms and the edges aredeﬁned simply by connecting atoms that lie within a predeﬁned cutoff distance c , then GNNs represent each atom viaan atom embedding layer h u ∈ R H and each edge via an edge embedding layer e uv ∈ R H .Most GNNs in this area can be summarized as a two-stage architecture , as known as the message-passing framework[14]: 1. Propagate node information among each other by neighborhood aggregation on each layer. 2. Form the wholegraph representation by a read-out function. Each layer of such GNNs can be written as: (cid:126)h ( l +1) v = f ( l ) θ ( (cid:126)h ( l ) v , { (cid:126)x e : e ∈ N E ( v ) } , { (cid:126)h ( l ) v (cid:48) : v (cid:48) ∈ N V ( v ) } ) . (1)3araVS: A Simple, Fast, Efﬁcient and Flexible Graph Neural Network Framework for Structure-Based VirtualScreening A P

REPRINT

Figure 2: Docked protein-ligand complex graph extension procedurewhere (cid:126)h ( l ) v is the node feature vector of node v at layer l . N V ( v ) and N E ( v ) denote sets of nodes and edges connectingto the node v , and f ( l ) θ is a parameterized function. The readout function R pools node features from the ﬁnal iteration K to obtain the entire graph’s representation h G : (cid:126)h G = R ( { h Kv | v ∈ G } ) . (2)In our implementation, we use our in-house developed GNN, HagNet [24]. HagNet can be formally described as: (cid:126)h ( l +1) v = φ ( concat ( h lv , ( (cid:88) u ∈N ( v ) h lu + max u ∈N ( v ) h lu ))) (3)where φ is a multilayer perceptron(MLP) network and concat layer concatenates features. ParaVS-Dock. f is the graph neu-ral network we need to train. dist ( i, j ) is the dis-tance between node i and j . d j is the minimumdistance of node j to the ligand, deﬁned as d j :=min i ∈ V c ,j dist ( i, j ) input: ligand atoms V c , protein atoms from a dockedprotein-ligand complex V p , cutoff distance cV := { i | i ∈ V c } ∪ { j | j ∈ V p , d j < c } E := {{ i, j }| i, j ∈ V, dist ( i, j ) < c, i (cid:54) = j } G := ( V, E ) y ← f ( G ) return y ParaVS-Dock is a docking-based method that takes a dock-ing result generated by

Autodock Vina1.1.2 [34] as an in-put, and output the whole docked protein-ligand complexrepresentation via a single GNN. The pseudo-codes arepresented in Algorithm 1. First, using the output of dock-ing programs, which is the 3D structure of a protein-ligandcomplex, to form a graph by setting atoms as nodes, andconnecting each two nodes if they are within a cutoff dis-tance c . Instead of modeling the whole protein-ligand com-plex into a graph, we solely put the atoms of the ligand intoa graph and ignoring atoms of the protein for now. Then,the graph is extended by adding atoms of the protein whosedistances to any atom of the ligand are within the cutoff dis-tance c , as illustrated in Fig 2 where atoms of the proteinare shown with orange background color. Finally, a GNNis applied to the extended graph. The idea behind is that the interaction between the protein and the ligand is vital andinformative for the ﬁnal prediction.We did not model the whole protein-ligand complex as a graph for the following reasons:(i) A protein size is normally much larger than a ligand size. A typical protein contains over 2000 atoms, whilethe number of atoms in a ligand is about 50. Even a protein pocket, which is signiﬁcantly smaller than a4araVS: A Simple, Fast, Efﬁcient and Flexible Graph Neural Network Framework for Structure-Based VirtualScreening A P

REPRINT

Figure 3: ParaVS-ND: Non-docking-based SBVS procedureprotein, can easily contain more than 500 atoms, still ten times the size of a ligand. Therefore, a protein-ligand complex graph will be dominated by the protein, increasing the difﬁculty for GNNs to differentiateactive and inactive compounds.(ii) On the contrary to (i), the number of proteins is much smaller than the number of compounds in most SBVSdatasets, in which over 1k or even 10k compounds paired with a single protein, making most protein-ligandcomplex graphs in the training set structurally similar to each other and further increases the difﬁculty oftraining.(iii) A docked protein-ligand complex graph can be too large for a GNN to train, causing serious efﬁciency andcomputing power problems.

Docking-based methods generally suffer from three main problems: 1. Most docking processes are time and com-puting power consuming while researchers often have to process a large amount of compounds. As listed in Table 1,it costs us months to ﬁnish DUD.E docking process with multiple parallel computing clusters. It is either too time-consuming or less approachable for those who are not accessible to high-performance computing. 2. Relying onheuristic algorithms, empirical functions, machine learning, or other techniques, docking programs are not giving outanalytical solutions. Errors are introduced during the docking stage, then propagate through follow-up models, andeventually harm the ﬁnal performances. 3. It is fairly difﬁcult to acquire high-quality data with binding poses. Thus,non-docking-based methods can be a vital supplement to SBVS tasks.

Algorithm 2:

ParaVS-ND. f c is the GNN we needto train for ligand and f p is the GNN for pocket, h u is the atom embedding layer input: ligand graph G c = ( V c , E c ) , protein pocketgraph G p = ( V p , E p ) ; cutoff distance ch up ← h u ( V p ) h uc ← h u ( V c ) h p ← f p ( h up , E p ) h c ← f c ( h uc , E c ) h ← concat ( h p , h c ) y ← M LP ( h ) return y However, contradicting to ligand modeling, how to modela protein remains a main problem that is unclear and worthdiscussing in SBVS tasks. Figure 4 shows the statisticalanalysis of this problem on our in-house NoDecoy dataset,which is described in Table 2 and Section 3 in detail.Coefﬁcient of Variation (CoV) in Figure 4 is deﬁned as theratio of the standard deviation σ to the mean µ : c v = σ/µ .It is a standardized measure of dispersion of a probabilitydistribution, a smaller c v indicates that the data concen-trates more on a certain range and easier to be modeled. – Amino Acids or Atoms? The ﬁrst decision happensbetween whether we should model in amino acids or atoms.A protein is made up of amino acids, and an amino acid ismade up of atoms. Therefore, a protein can be modeled inboth. We select to use atoms for the reason that a typicalSBVS dataset, including DUD.E and our in-house NoDe-coy, contains much more proteins than compounds, as reported in Table 2. Modeling in amino acids has a serious issueof lacking enough training data for the amino acid embedding layer. On the contrary, this problem can be naturallyresolved by sharing the atom embedding layer with the ligand part, since ligands are consisted of atoms but

NOT amino acids. 5araVS: A Simple, Fast, Efﬁcient and Flexible Graph Neural Network Framework for Structure-Based VirtualScreening

A P

REPRINT

40 60 80 100 120 a. Number of Amino Acids in a Pocket F r e q u e n c y Coefficient of Variation = 16.12%

300 400 500 600 700 800 900 b. Number of Atoms in a Pocket F r e q u e n c y Coefficient of Variation = 15.47%

100 200 300 400 500 600 c. Number of Amino Acids in a Protein F r e q u e n c y Coefficient of Variation = 31.55% d. Number of Atoms in a Protein F r e q u e n c y Coefficient of Variation = 32.05%

Figure 4: ( a ) and ( b ) denote the distributions of numbers of amino acids and atoms in a pocket; ( c ) and ( d ) denotethe distributions of numbers of amino acids and atoms in a protein. Distributions in ( c ) and ( d ) are right-skewed withhigher variation and higher CoV comparing to ( a ) and ( b ), thus, they are harder to be modeled by GNN. Number of

Table 2: DUD.E and NoDecoy dataset descriptions, Active Ratio = of Active Ligands of Compounds – Protein or Pocket? Figure 4.a and Figure 4.b report c v about and Figure 4.c and Figure 4.d report c v about . From Figure 4, It is clear that, whether we choose to model in amino acids or atoms, to model pocket onlyinstead of a full protein is easier for GNNs. Additionally, Figure 4 also indicates that a protein is about 4 to 6 timeslarger than a pocket, whether in amino acids or atoms, resulting in more time and computing power consuming.The pseudo-codes of ParaVS-ND is presented in Algorithm 2. Firstly, two individual graphs are formed via thesame methodology stated in the docking-based method using the ligand and the protein separately. They are initiatedby the same atom embedding layer, then two individual GNNs are applied to get the representations. Finally, therepresentations are concatenated as an input to a classiﬁer neural network, which is a MLP in our implementation.Note that weights are not shared between the two GNNs(except the atom embedding layer), since ligands and proteinsare different from each other in attributes, sizes and features. The framework is also displayed in Figure 3. While weseek to maintain it as simple as possible, multiple optional modules are available, such as replacing the MLP with aCNN or a RNN module, adding attention mechanisms to the MLP, etc. Dataset

We analyze our models and methods on two datasets, a public dataset and a dataset combining both publicand in-house proprietary data. With such setting, we are trying to evaluate our approach’s generalizability and re-producibility. The number of proteins and ligands of both datasets are listed in Table 2. Additionally, we performinference on RDB with both ParaVS-Dock and ParaVS-ND to compare the computational efﬁciency, as reported inTable 1. 6araVS: A Simple, Fast, Efﬁcient and Flexible Graph Neural Network Framework for Structure-Based VirtualScreening

A P

REPRINT – DUD.E : An enhanced and rebuilt version of DUD(Directory of Useful Decoys) [26]. It contains 102 proteintargets. We train and evaluate our model on both full DUD.E dataset and its subset DUD, which contains 38 proteinsout of 102 as shown in Table 2. – NoDecoy : A combination of DUD.E, excluding its computer generated decoys, and our in-house proprietarydataset which contains biochemical data of 261987 compounds and 83 targets using IC = 10 µM as cutoff todifferentiate active and inactive compounds. We exclude decoys from DUD.E in order to eliminate implicit computergenerated patterns, and to balance the proportion of positive and negative samples. Then we combine the remainingcompounds with our proprietary dataset, which contains no decoys by nature. – RDB : Enamine REAL database (RDB) [31] is the largest enumerated database which comprises over 1.36 billionsynthetically feasible molecules. We use RDB for inference only in this paper to simulate a regular screening task,since it is such a large database that can reveal the computational efﬁciency of ParaVS-ND. Cross Validation

The performances of ParaVS are assessed in two types of cross validation(CV). – Leave-One-Out(LOO) CV : Leave-one-out cross validation is a technique for assessing how the method willgeneralize to independent data. Speciﬁcally, the model will leave one protein and all compounds pairing with it out inrotation, and train on all other remaining proteins and compounds, then do model testing on the left-out protein andcompounds. We compare our results with DeepVS [28] and Lim et al. [25]. – K-fold CV : In k -fold CV, the dataset is randomly partitioned into k equal sized subsets. Of the k subsets, a singlesubset is retained as the validation set, and other k − subsets are used as the training set. The CV process is repeated k times, with each subset used as the validation set exactly once. In our experiments, k = 5 . Evaluation Metrics

To evaluate our approaches, we select two extensively used metrics for comparison: the enrich-ment factor (EF) and the area under the Receiver Operating Characteristics curve (AUC). – AUC : AUC is also written as AUROC in some literatures, and is one of the most important evaluation metrics forclassiﬁcation problems [11]. – EF : Since the tested compounds are ranked by score, EF is used to indicate how good the prediction is on the top x % ranked compounds. The EF at x % is computed as: EF x % = actives at x % compounds at x % × total compoundstotal actives (4) Deep Learning Settings

We try to maintain the training and testing procedure as simple as possible, avoiding toomany tricks in order to verify the robustness and generalizability of our methods. – Pre-training : All models are trained from scratch without any pre-training. – Fine-tuning : GNN hyper-parameters are set to be the same for all the experiments, and we did not do grid searchor any other ﬁne-tuning on our models. Better performances are expected to be achievable with more deliberate works. – Optimizer : SGD with a constant learning rate. – Computation Power : All experiments are individually run on a single P100 GPU. The computation cost of ourmethod is relatively low.

The results of ParaVS-Dock are reported in Figure 5. Figure 5(left) shows the LOO CV AUC of each of 102 targetproteins from DUD.E, most of which achieves an AUC > . , and only 10 AUCs are under . . Figure 5 (right)shows each of 38 target proteins from DUD.Next, we test ParaVS-ND with the same settings, as shown in Figure 6. Comparing to ParaVS-Dock, AUC mean hasboosted from . to . , and most AUCs are above . with AUC min improved from . to . , revealingthat ParaVS-ND is more robust and of greater generalizability.Comparisons of AUC and EF to methods listed in other literatures are reported in Table 3 and Table 4. They showthat both ParaVS-Dock and ParaVS-ND perform well. Speciﬁcally, ParaVS-ND acquires a mean EF = 39 . on7araVS: A Simple, Fast, Efﬁcient and Flexible Graph Neural Network Framework for Structure-Based VirtualScreening A P

REPRINT EF DUD.E max EF DUD.E EF DUD.E EF DUD max EF DUD EF DUD

ParaVS-Dock 34.4 22.4 4.4 31.3 19.2 4.3ParaVS-ND 62.6

DeepVS-ADV [28] - - - 16.0 6.6 3.1Lim et al. [25] - 33.5 - - - -Lim et al. [25] w/o attention - 30.3 - - - -Ragoza et al. [29] - 19.4 - - - -Torng et al. [33] - 19.4 - - - -

Table 3: Mean Enrichment Factor(EF) comparison. The presented EF value is a mean value computed by averagingover proteins in LOO CV experiments. All EF

DUD.E values are performed on DUD.E and EF

DUD values are onDUD.

AUC mean

DUD.E DUDParaVS-Dock 0.926 0.911ParaVS-ND et al. [25] 0.968 -Lim et al. [25] w/o attention 0.936 -AtomNet [38] 0.855 -Ragoza et al. [29] 0.868 -Torng et al. [33] 0.886 -Gonczarek et al. [15] 0.904 -

Table 4: Mean AUC comparison. The mean value is computed by averaging over proteins in LOO CV experiments. Itshows that ParaVS-ND outperforms methods in other literatures, and an AUC mean = 0 . is state-of-the-art as far aswe know.DUD.E, mean EF = 36 . on DUD, and an AUC mean = 0 . on DUD.E and AUC mean = 0 . on DUD, all arestate-of-the-art as far as we know at the moment. To evaluate our methods and framework at production stage, we eliminate decoys from DUD.E and integrate itstrue active ligands into our in-house dataset to form a new dataset, namely NoDecoy. Afterwards, we perform a5-fold CV experiment by a random split. As illustrated in Figure 7, we acquired an AUC mean = 0 . for ParaVS-Dock and AUC mean = 0 . for ParaVS-ND. Considering the dataset contains no decoys and active ratio = 78 . ,which is much more balanced than DUD.E whose active ratio = 1 . as listed in Table 2, we ﬁnd the performancequite satisfying. EF maintains a constant number . because it is the maximum value we can achieve, since inEquation 4 we have: total ligandstotal actives = 1 Active Ratio = 178 .

19% = 1 . (5)In addition to AUC and EF, we also evaluate the prediction accuracy. Figure 7 reveals that: ParaVS-ND outperformsParaVS-Dock in all metrics we assessed; performances of ParaVS-ND raise and converge faster; the AUC curve of theParaVS-ND ﬂuctuates less, indicating a better model stability; the AUC and accuracy curves of both methods are stillraising, it is expected that better performances may be reached given more training epochs. Table 1 reports that ParaVS-Dock is faster than ParaVS-ND by itself, but the docking process is too time-consumingthat ParaVS-Dock spent most of its running time on docking and becomes incomparable to ParaVS-ND efﬁciency-wise. The RDB test is valuable for the reason that performing inference on thousands of millions of compounds isnot a rare thing in drug discovery process. The core-hours ParaVS-ND took for RDB is a fairly good result that about . e compounds can be screened in an hour with a single CPU, and the speed is scalable with parallel processingclusters. 8araVS: A Simple, Fast, Efﬁcient and Flexible Graph Neural Network Framework for Structure-Based VirtualScreening A P

REPRINT

Protein Index A U C Min AUC = 0.628 Max AUC = 1.000Mean AUC = 0.926

ParaVS-Dock DUD.E - 102 Proteins LOO AUC

Protein Index A U C Min AUC = 0.628 Max AUC = 0.996Mean AUC = 0.911

ParaVS-Dock DUD - 38 Proteins LOO AUC

Figure 5: ParaVS-Dock leave-one-out(LOO) cross-validation(CV) performance, each point represent an AUC scorefrom LOO CV, and proteins are sorted in ascending order by AUC.

Left : DUD.E: AUC max = 1 . , AUC min = 0 . ,AUC mean = 0 . . Right : DUD: AUC max = 0 . , AUC min = 0 . , AUC mean = 0 . . Protein Index A U C Min AUC = 0.831 Max AUC = 1.000Mean AUC = 0.981

ParaVS-ND DUD.E - 102 Proteins LOO AUC

Protein Index A U C Min AUC = 0.840 Max AUC = 1.000Mean AUC = 0.974

ParaVS-ND DUD - 38 Proteins LOO AUC

Figure 6: ParaVS-ND leave-one-out(LOO) cross-validation(CV) performance, each point represent an AUC scorefrom LOO CV, and proteins are sorted in ascending order by AUC.

Left : DUD.E: AUC max = 1 . , AUC min = 0 . ,AUC mean = 0 . . Right : DUD: AUC max = 1 . , AUC min = 0 . , AUC mean = 0 . . We summarize our main conclusion as follows:-

Non-docking-based methods possess advantages over docking-based ones based on results from our experiments. - GNN can play a signiﬁcant role in drug design, and an increasing number of methods start to use GNN to modelmolecules now. - Datasets from virtual screening process can be fairly large, thus, it is vital for VS methods to balance performanceand efﬁciency.

We further provide answers to a few important questions people are interested in:

Is GNN a solution to SBVS tasks?

Yes, although GNN would

NOT be the only answer and there are deﬁnitely othermodels worth investigating. Since ligands and target proteins can be modeled as graphs in a straightforward way, andtheories of GNN are developing rapidly, we expect SBVS tasks to beneﬁt more from GNN models in the future.

Is the non-docking-based method better than the docking-based method in all aspects?

No, there are a few problemsworth noticing: – Model size : The non-docking-based model is almost twice the size of the docking-based one since it contains twoindependent GNNs. Although it can be imagined that reducing the model size by trimming models skillfully can be ahelp, the docking-based method is preferred in a model size sensitive task. – Computing power cost and speed : There is a strong motivation to reduce computing time and save comput-ing power in drug discovery process, while the non-docking-based method takes more time in both training andinference. But things change when the full procedure taken into consideration, as the docking process, known as a9araVS: A Simple, Fast, Efﬁcient and Flexible Graph Neural Network Framework for Structure-Based VirtualScreening

A P

REPRINT

Epoch A U C ParaVS-DockParaVS-ND

Epoch E n r i c h m e n t F a c t o r @ % ParaVS-DockParaVS-ND

Epoch A cc u r a c y ( % ) ParaVS-DockParaVS-ND

Figure 7: ParaVS-Dock vs.

ParaVS-ND performance in the k − fold cross validation, where k = 5 in our experiment.Metrics are averaged on each epoch by k folds. Left : AUC curve.

Middle : Enrichment factor @ 2% curve.

Right :Accuracy curve.time-consuming step, can be circumvented in the non-docking-based method. In short, the docking-based method isfaster

ONLY when docked protein-ligand complexes are provided. – Pocket Detection : We acquired pockets from proteins along with the dataset. However, this is

NOT always thecase. If pockets are not provided, we have to implement pocket detection, which can be done by various methods[7, 21, 32]. The situation is similar to docking that the analytical solution is not available and, although much moreefﬁcient and faster than docking, pocket detection takes time.Drug discovery has long been a tough task. It has progressed signiﬁcantly ever since computational methods got intothe ﬁeld. Hopefully, our work can inspire more thoughts about SBVS, and the analysis and discussions about ligandand protein modeling can slightly bridge the gap between drug discovery and computer science.

References [1] M. Arciniega and O. Lange. Improvement of virtual screening results by docking data feature analysis.

Journal of chemicalinformation and modeling , 54, 05 2014.[2] P. Ballester and J. Mitchell. A machine learning approach to predicting protein-ligand binding afﬁnity with applications tomolecular docking.

Bioinformatics (Oxford, England) , 26:1169–75, 03 2010.[3] Y. Bengio. Learning deep architectures for ai.

Foundations , 2:1–55, 01 2009.[4] Y. Bengio, A. Courville, and P. Vincent. Representation learning: A review and new perspectives.

IEEE transactions onpattern analysis and machine intelligence , 35:1798–1828, 08 2013.[5] C. Bissantz, G. Folkers, and D. Rognan. Protein-based virtual screening of chemical databases. 1. evaluation of differentdocking/scoring combinations.

Journal of medicinal chemistry , 43:4759–67, 01 2001.[6] T. Cheng, Q. Li, Z. Zhou, Y. Wang, and S. Bryant. Structure-based virtual screening for drug discovery: a problem-centricreview.

The AAPS journal , 14:133–41, 03 2012.[7] Y. Cui, Q. Dong, D. Hong, and X. Wang. Predicting protein-ligand binding residues with deep convolutional neural networks.

BMC Bioinformatics , 20, 02 2019.[8] M. Drwal and R. Grifﬁth. Combination of ligand- and structure-based methods in virtual screening.

Drug Discovery Today:Technologies , 10:395, 06 2013.[9] J. Durrant and J. McCammon. Nnscore: A neural-network-based scoring function for the characterization of protein-ligandcomplexes.

Journal of chemical information and modeling , 50:1865–71, 10 2010.[10] J. Durrant and J. McCammon. Nnscore 2.0: A neural-network receptor-ligand scoring function.

Journal of chemical infor-mation and modeling , 51:2897–903, 11 2011.[11] T. Fawcett. Introduction to roc analysis.

Pattern Recognition Letters , 27:861–874, 06 2006.[12] K. Gao, A. Fokoue, H. Luo, A. Iyengar, and P. Zhang. Interpretable drug target prediction using deep neural representation.pages 3371–3377, 07 2018.[13] S. Ghosh, A. Nie, J. An, and Z. Huang. Structure-based virtual screening of chemical libraries for drug discovery.

Currentopinion in chemical biology , 10:194–202, 07 2006.[14] J. Gilmer, S. Schoenholz, P. Riley, O. Vinyals, and G. Dahl. Neural message passing for quantum chemistry. 04 2017.[15] A. Gonczarek, J. M. Tomczak, S. Zarƒôba, J. Kaczmar, P. DƒÖbrowski, and M. J. Walczak. Interaction prediction in structure-based virtual screening using deep learning.

Computers in Biology and Medicine , 100:253‚Äì258, Sep 2018.[16] D. Hecht and G. Fogel. Computational intelligence methods for docking scores.

Current Computer - Aided Drug Design , 5,03 2009.[17] W. Iram and t. anjum.

Production of Cyclosporine A by Submerged Fermentation , pages 1 – 28. 11 2015.

A P

REPRINT [18] M. Karimi, D. Wu, Z. Wang, and Y. Shen. Deepafﬁnity: Interpretable deep learning of compound protein afﬁnity throughuniﬁed recurrent and convolutional neural networks, 06 2018.[19] S. Kinnings, N. Liu, P. Tonge, R. Jackson, L. Xie, and P. Bourne. A machine learning-based method to improve dockingscoring functions and its application to drug repurposing.

Journal of chemical information and modeling , 51:408–19, 022011.[20] D. Kitchen, H. Decornez, J. Furr, and J. Bajorath. Docking and scoring in virtual screening for drug discovery: Methods andapplications.

Nature reviews. Drug discovery , 3:935–49, 12 2004.[21] V. Le Guilloux, P. Schmidtke, and P. Tuffery. Fpocket: An open source platform for ligand pocket detection.

BMC bioinfor-matics , 10:168, 02 2009.[22] Y. LeCun, Y. Bengio, and G. Hinton. Deep learning.

Nature , 521:436–44, 05 2015.[23] I. Lee, J. Keum, and H. Nam. Deepconv-dti: Prediction of drug-target interactions via deep learning with convolution onprotein sequences.

PLOS Computational Biology , 15:e1007129, 06 2019.[24] D. Leng. Heterogeneous aggregation graph network for molecule property prediction., preprint 2021.[25] J. Lim, S. Ryu, K. Park, Y. Choe, J. Ham, and W. Kim. Predicting drug-target interaction using a novel graph neural networkwith 3d structure-embedded graph representation.

Journal of Chemical Information and Modeling , 59, 08 2019.[26] M. Mysinger, M. Carchia, J. Irwin, and B. Shoichet. Directory of useful decoys, enhanced (dud-e): Better ligands and decoysfor better benchmarking.

Journal of medicinal chemistry , 55:6582–94, 06 2012.[27] T. Nguyen, H. le, and S. Venkatesh. Graphdta: prediction of drug-target binding afﬁnity using graph convolutional networks,06 2019.[28] J. C. Pereira, E. R. Caffarena, and C. N. dos Santos. Boosting docking-based virtual screening with deep learning.

Journal ofChemical Information and Modeling , 56(12):2495–2506, 2016. PMID: 28024405.[29] M. Ragoza, J. Hochuli, E. Idrobo, J. Sunseri, and D. R. Koes. Protein‚Äìligand scoring with convolutional neural networks.

Journal of Chemical Information and Modeling , 57(4):942‚Äì957, Apr 2017.[30] G. Schneider. Virtual screening: An endless staircase?

Nature reviews. Drug discovery , 9:273–6, 04 2010.[31] A. Shivanyuk, S. Ryabukhin, A. Bogolyubsky, D. Mykytenko, A. Chuprina, W. Heilman, A. Kostyuk, and A. Tolmachev.Enamine real database: Making chemical diversity real.

Chimica Oggi , 25:58–59, 11 2007.[32] M. Stepniewska-Dziubinska, P. Zielenkiewicz, and P. Siedlecki. Detection of protein-ligand binding sites with 3d segmenta-tion, 04 2019.[33] W. Torng and R. Altman. Graph convolutional neural networks for predicting drug-target interactions.

Journal of ChemicalInformation and Modeling , 2019, 10 2019.[34] O. Trott and A. Olson. Software news and update autodock vina: Improving the speed and accuracy of docking with a newscoring function, efﬁcient optimization, and multithreading.

Journal of computational chemistry , 31:455–61, 11 2009.[35] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin. Attention is all youneed, 2017.[36] C. Venkatachalam, X. Jiang, T. Oldﬁeld, and M. Waldman. Ligandﬁt: a novel method for the shape-directed rapid docking ofligands to protein active sites.

Journal of Molecular Graphics and Modelling , 21(4):289 – 307, 2003.[37] S. Vilar, G. Cozza, and S. Moro. Medicinal chemistry and the molecular operating environment (moe): application of qsarand molecular docking to drug discovery.

Current topics in medicinal chemistry , 8(18):1555‚Äî1572, 2008.[38] I. Wallach, M. Dzamba, and A. Heifets. Atomnet: A deep convolutional neural network for bioactivity prediction in structure-based drug discovery. 10 2015.[39] R. Wang, L. Lai, and S. Wang. Further development and validation of empirical scoring functions for structure-based bindingafﬁnity prediction.

Journal of computer-aided molecular design , 16:11–26, 02 2002.[40] D. Weininger. Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules.

Journal of Chemical Information and Computer Sciences , 28:31–36, 02 1988.[41] J. Zhou, G. Cui, Z. Zhang, C. Yang, Z. Liu, L. Wang, C. Li, and M. Sun. Graph neural networks: A review of methods andapplications, 2019., 28:31–36, 02 1988.[41] J. Zhou, G. Cui, Z. Zhang, C. Yang, Z. Liu, L. Wang, C. Li, and M. Sun. Graph neural networks: A review of methods andapplications, 2019.