[PDF] Distance-aware Molecule Graph Attention Network for Drug-Target Binding Affinity Prediction

Abstract

Accurately predicting the binding affinity between drugs and proteins is an essential step for computational drug discovery. Since graph neural networks (GNNs) have demonstrated remarkable success in various graph-related tasks, GNNs have been considered as a promising tool to improve the binding affinity prediction in recent years. However, most of the existing GNN architectures can only encode the topological graph structure of drugs and proteins without considering the relative spatial information among their atoms. Whereas, different from other graph datasets such as social networks and commonsense knowledge graphs, the relative spatial position and chemical bonds among atoms have significant impacts on the binding affinity. To this end, in this paper, we propose a diStance-aware Molecule graph Attention Network (S-MAN) tailored to drug-target binding affinity prediction. As a dedicated solution, we first propose a position encoding mechanism to integrate the topological structure and spatial position information into the constructed pocket-ligand graph. Moreover, we propose a novel edge-node hierarchical attentive aggregation structure which has edge-level aggregation and node-level aggregation. The hierarchical attentive aggregation can capture spatial dependencies among atoms, as well as fuse the position-enhanced information with the capability of discriminating multiple spatial relations among atoms. Finally, we conduct extensive experiments on two standard datasets to demonstrate the effectiveness of S-MAN.

Full PDF

DDistance-aware Molecule Graph Attention Networkfor Drug-Target Binding Afﬁnity Prediction

Jingbo Zhou , Shuangli Li , Liang Huang , Haoyi Xiong , Fan Wang , Tong Xu , HuiXiong , and Dejing Dou Baidu Inc., Email: { zhoujingbo, lishuangli, lianghuang,xionghaoyi,wangfan04,doudejing } @baidu.com. University of Science and Technology of China, Email: [email protected]. Rutgers, The State University of New Jersey, Email:[email protected].

ABSTRACT

Accurately predicting the binding afﬁnity between drugs and proteins is an essential step for computational drug discovery.Since graph neural networks (GNNs) have demonstrated remarkable success in various graph-related tasks, GNNs have beenconsidered as a promising tool to improve the binding afﬁnity prediction in recent years. However, most of the existing GNNarchitectures can only encode the topological graph structure of drugs and proteins without considering the relative spatialinformation among their atoms. Whereas, different from other graph datasets such as social networks and commonsenseknowledge graphs, the relative spatial position and chemical bonds among atoms have signiﬁcant impacts on the binding afﬁnity.To this end, in this paper, we propose a diStance-aware Molecule graph Attention Network (S-MAN) tailored to drug-targetbinding afﬁnity prediction. As a dedicated solution, we ﬁrst propose a position encoding mechanism to integrate the topologicalstructure and spatial position information into the constructed pocket-ligand graph. Moreover, we propose a novel edge-nodehierarchical attentive aggregation structure which has edge-level aggregation and node-level aggregation. The hierarchicalattentive aggregation can capture spatial dependencies among atoms, as well as fuse the position-enhanced informationwith the capability of discriminating multiple spatial relations among atoms. Finally, we conduct extensive experiments on twostandard datasets to demonstrate the effectiveness of S-MAN.

Drug-target binding afﬁnity (DTA) prediction has been widely considered as one of the most important tasks in computationaldrug discovery for a long time [1]. Drugs are chemical compounds, which can be represented by a molecular graph in general.Drugs and small molecules are also called ligands which can react with targets. Usually, targets are referred to as commonlyproteins, such as enzymes, ion channels and receptors, which can activate or inhibit a biological process to cure a diseaseafter binding with a ligand. The quantity of binding strength (measured by a real number) among the drug-target interaction isreferred to as binding afﬁnity which is an important concept related to the treatment of diseases.

Example 1.

Taking Figure 1 for example, there are two main objects in DTA prediction: 1) A drug (or ligand) can interactwith the 2) target (or protein) at the speciﬁc circular area as shown in Figure 1(a). DTA prediction aims to output the strengthof the interactions of the pairs. One way to do that is to separate the protein-ligand complexes into two sequence formulas asshown in Figure 1(b) and then input them into a prediction function; while Figure 1(c) demonstrates another way to extract thespeciﬁc binding pose (i.e., pocket-ligand) for DTA prediction.

Conventionally, the drug-target binding afﬁnity can be estimated by high-throughput screening experiments, which is acostly and time-consuming process [2, 3]. Predicting DTA can help to accelerate the virtual screening of compounds, whichreduces the time and cost of high-throughput screening data by cherry-picking compounds [4, 5, 6]. Therefore, accuratelyand effectively predicting DTA which can bring great economical beneﬁts for developing new drugs has attracted signiﬁcantresearch attention in past decades [7, 8, 9, 10, 11].The early studies of DTA mainly focus on developing physic-based methods and machine-learning methods. Speciﬁcdomain knowledge is required to design scoring functions [7] and extract features for physic-based methods. Machine learningmethods also utilize the well-designed features [8] for DTA prediction. However, these methods relying on feature engineeringand useful rules suffer from the problem of limited accuracy and generality on large datasets.With the development of deep learning, convolutional neural networks (CNNs) are also exploited in some studies for DTAprediction. Both 1D-CNN model [12] applied to drug-protein sequence and 3D-CNN model [13] treating drug-target complexesas 3D images are investigated. In fact, both drugs (a.k.a ligands) and targets (a.k.a proteins) are 3D graphs, which preservethe position information of atoms and relations of covalent bonds. CNN-based methods cannot directly encode such structure1 a r X i v : . [ q - b i o . Q M ] D ec rotein-Ligand InteractionProteinLigand Ligand

Binding Pose

Protein pocket

OC(=O) ... ccc1O

DrugSMILES

MPPYT ... GNGKQ

ProteinSequence

GeneralSplit !. Separate Formula

Extracted Pocket-Ligand $. ( Target )( Drug ) Figure 1.

An illustrative example (PDB: 13GS) of converting the protein-ligand interactions (in ﬁgure a ) to the input of modelin two different ways (in ﬁgure b and c ).information of compounds, and these machine learning models can only be trained in an end-to-end fashion with requiring alarge dataset. However, because the high-throughput screening experiments are high-cost and time-consuming, the size of theavailable DTA dataset is always small, which degrades the performance of CNN methods.More recently, graph neural network (GNN) models have exhibited powerful ability of learning molecular graph for DTAprediction [14, 15], since they can incorporate the graph structure indicated by covalent bonds into the models. Although thesemodels have achieved great performance, there are two obvious drawbacks: (I) Most existing GNN models can only treat thecompounds as a topological graph whose nodes are atoms and whose edges are covalent bonds. The spatial information ofrelative position among atoms in drug-target complexes is not taken into consideration. However, binding afﬁnity is relatedclosely to such spatial position information. An evidence as illustrated in Figure 2(a) is that the binding afﬁnity has a strongcorrelation to the max distance among atoms of drug-target complexes. (II) Current approaches mostly follow the frameworkof question-answering-like Siamese network styles to learn the drug and protein representations separately [16] as shown inFigure 1(b). Such a separate learning structure leads to the insufﬁciency of modeling the interactions between drug and proteinwithout considering spatial information.To address the aforementioned limitations of CNN and GNN models, in this paper, we propose a novel end-to-end learningframework for DTA prediction, namely diStance-aware Molecule Graph Attention Network (S-MAN). Figure 4 demonstratesthe framework of our proposed model. Firstly, due to the lack of necessary spatial information in the basic molecular graph, ourmodel takes the Spatial-enhanced Pocket-ligand Graph (SPoG) as input, which is constructed on the basis of protein pocket andligand binding pose as shown in Figure 1(c). SpoG involves not only the 3D spatial distance information but also interactionsbetween ligand and protein. The deﬁnition and construction of SPoG will be introduced in section 3. Then we design a spatialposition encoding mechanism to model different spatial distance relations among atoms as illustrated in Figure 3. Since therelative spatial distance information of atoms is attached to the edges upon the position encoding mechanism, it is difﬁcult topropagate such information by aggregating nodes directly. Thus, we further invent a hierarchical attentive structure which hastwo steps: edge-level aggregation and node-level aggregation. This hierarchical structure can ﬁrst capture spatial dependenciesin edge-level stage, and then distinguish multiple spatial relations of 3D structure in node-level stage. To summarize, the maincontributions of our work are as follows:• We ﬁrst study the problem of DTA prediction by constructing a unique spatial-enhanced pocket-ligand graph and proposea novel distance-aware molecule graph attention network named S-MAN.• The S-MAN employs both edge-level and node-level attentive aggregation with leveraging the spatial distance informationand relative position of atoms. To the best of our knowledge, we are the ﬁrst to adopt the hierarchical GNN structure withdistance-ware attention for DTA prediction.• The experiments on two datasets demonstrate that the proposed model outperforms the classic baselines and state-of-the-art GNN methods, which shows great application potential for drug discovery. Max Distance (Angstroms) b i n d i n g a ff i n i t y (a) Spatial Distance (Angstroms) P e r c e n t (b) Figure 2.

Spatial distance visual analysis. (a) The correlation between the bindding afﬁnity and the max distance of atoms indrug-target complex; (b) The distribution of spatial distance between atoms within 5 Å in drug-target complex. $ % & ' ( ) spatial relationspatial relation Atom nodespatial area spatial area

Figure 3.

A toy example of spatial position encoding.

Since our study is dedicated to predicting the drug–target binding afﬁnity with designing a new GNN architecture, we brieﬂyintroduce the related work on these two topics.

Drug-target binding afﬁnity (DTA) prediction in drug discovery has attracted a large number of researchers’ interest. Manyprevious works focused on simulation-based methods[7] or classic machine learning models[17], with requiring external expertdomain knowledge. Recently, most of the existing studies aim at solving the DTA problem based on deep learning models,such as DeepDTA[12] and WideDTA[18]. These models utilize 1D convolutions and pooling to capture potential patternsfrom 1D ligand sequence and protein sequence. Thus, the necessary spatial and structure information is neglected. The recentadvanced GNN models to incorporate the structure information of drug-target complex, like GraphDTA [14], has shown betterperformance than them. Therefore, we did not compare with such 1D convolution methods in our experiments. There are alsosome works[13] constructing 3D image from drug-target complexes to use 3D convolutions (3D-CNN) to take advantage ofspatially-local correlation. Though such a 3D-CNN approach can learn spatial information, it has potential drawbacks. On theone hand, 3D-CNN requires a large number of model parameters, but the size of training data is limited for DTA problem. Onthe other hand, the position of ligand or protein in different complexes is changeable, such as different angle rotation, whichmeans the spatial structure of 3D image modeling is inevitably incomplete. To better learn the relative spatial information, ourwork develops the GNN architecture with integrating position of atoms for DTA prediction.

With the increasing popularity of graph neural networks (GNNs), much attention has been devoted to applying GNN formolecular representation learning. To integrate topological structure of molecular graph, GraphDTA[14] adopts several powerfulGNNs[19, 20, 21] to learn the drug presentation. Although GraphDTA shows reasonable performance in DTA prediction,it lacks the ability of learning position information in molecular graph and interaction information between drug-target. Bycontrast, our work offers a new perspective for drug-target prediction with the assistance of the critical relative position inmolecular graph. There are only a few of works studying GNNs with considering spatial information in recent years. MGCN[22] nd DimeNet[23] utilize the classic radial basis function to combine the meaningful distance information on the original graph.However, only the covalent bonding correlation is not sufﬁcient for 3D structure learning. What’s more, different distancesbetween atoms indicate different relations, while RBF can not provide this information explicitly. It is also worth noting thatall these models fail to aggregate the spatial information attentively, and they are designed for molecular property prediction,which is quite different from DTA prediction. To the best of our knowledge, we are the ﬁrst to propose a dedicated GNN modeltailored to the DTA problem which can identify multiple spatial relations while aggregating with distance-aware attentionmechanism.

In this section, we ﬁrst formally introduce the drug-target binding afﬁnity (DTA) prediction problem. Then we describe theconstruction process of the spatial-enhanced pocket-ligand graph in detail.

We ﬁrst clarify several related concepts of DTA prediction. The drug is referred to as a chemical compound, which is also called ligand in DTA prediction. What’s more, the protein is also called target . Thus, in this paper, the main objects of drug-targetbinding afﬁnity prediction are related to ligand and protein . As illustrated in Figure 1, protein-ligand interactions can beinterpreted in two ways. We will introduce and compare them as follows.Given a drug compound and a target protein, the DTA prediction task is to predict the binding afﬁnity between them. Ingeneral, we use L and P to represent the input drug (or ligand) and the input target (or protein). Both can be a graph, a sequence,or other format input. The predicted binding afﬁnity y is a continuous real number value. Traditionally, DTA prediction can bedeﬁned as a regression task: f : ( L , P ) → y (1)To overcome the limitation of weak interaction information between drug and protein when splitting them into two-partinput, as illustrated in Figure 1(c), a more appropriate formulation is to represent drug-protein complex as protein-ligandbinding pose [24] with preserving the essential spatial structure, we call it pocket-ligand for short in this paper. Meanwhile, thesize of pocket-ligand is signiﬁcantly less than that of the original ligand-protein. The pocket-ligand graph can be denoted as G ,and the distance matrix of atoms in graph G can be represented as D . The construction of G and D will be introduced next.Now the problem can be deﬁned as: g : ( G , D ) → y (2) As we claimed in section 2, the structure information in the original molecular graph with only the covalent bonding correlationis not enough. More spatial edges should be included in the graph to provide adequate 3D structure information. What’smore, there is no natural bonding between ligand and protein. Therefore, the input interaction graph of our proposed modelis the spatial-enhanced pocket-ligand graph (SPoG). We denote the SPoG by a new graph G = ( V = V M ∪ V P , E ) , where V = { a , a , ..., a N } , V M and V P are the atom set of ligand and protein pocket. To build spatial-enhanced edges for G , we ﬁrstcalculate the spatial distances between all atoms in 3D space, the distance matrix is denoted as D . Then a threshold θ d is set topreserve the correlation edge e i j between a pair of atoms. In this way, the edge set can be built: E = { e i j = ( v i , v j ) | v i , v j ∈ V , D i j ≤ θ d } (3) In this section, we propose a distance-aware molecule graph attention network (S-MAN) to address the drug-target bindingafﬁnity prediction problem.

Figure 4 depicts the framework of S-MAN. As mentioned in Section 3, it takes the spatial-enhanced pocket-ligand graph G and position matrix S as input. S-MAN contains distance-aware molecule graph attention layers designed for DTA prediction,which propagates the atom representation spatially and attentively. The two aggregation operations, edge-level aggregationand node-level aggregation, play a synergistic effect on improving the performance. Then the graph pooling layer producesthe graph representation to obtain the ﬁnal binding afﬁnity score by applying several fully connected layers. In the followingsections, we use a i and aaa i (i.e. bold letter) to represent the atom node and the embedding of atom i respectively. Similarly, theedge ( a i , a j ) is denoted as e i j and the embedding of edge is denoted as eee i j . " ! ! $ % % $" ! " ! $ % $" ! " ! ! $ % % $" ! " ! $ % $" GraphPoolingFCFCAffinity scorePocket-Ligand ! " ! ! $ ! & GraphConstructionPositionEncoding

Atom NodeEdge from atom nodeEdgeafter AGG Aggregationoperator FC ! " ! ! $ ! & % % &" % $" ! " ! ! $ ! & % $" % &" Edge-level AGGNode-level AGG

Figure 4.

Illustration of Distance-aware Molecule Graph Attention Network (S-MAN) for DTA prediction.

Different from the general graph, the molecular graph, like drug compound and protein, has a unique 3D spatial structure,which may affect the molecular property and interaction strongly. For example, as shown in Figure 2(a) and 2(b), the bindingafﬁnity is related closely to the max distance in complexes. Also, the spatial distance distribution in pocket-ligand indicatesthat the covalent bond can be formed when the spatial distance between atoms is less than a certain distance. The abilityto learn such spatial structure and position information is critical to biological modeling, especially in the DTA predictionproblem. To integrate the topological structure and spatial position information suitably, a novel distance-aware molecule graphattention network (S-MAN) is designed with a hierarchical attentive aggregation structure: edge-level aggregation → node-levelaggregation. First, we adopt an edge-level aggregation to deliver the pairwise atoms’ embedding with position information toget the edge embedding. After that, node-level aggregation can get the optimal weighted combination of the edge embeddingsfor each atom node with a distance-aware attention. Position Encoding.

The position of atoms in pocket-ligand is deﬁned by 3D coordinates, forming the input position matrix S ∈ R N × . Consideringthe variability of coordinates that are manually deﬁned, we convert S into a relative spatial matrix, that is distance matrix D ∈ R N × N . From Figure 2(b), we noticed that spatial distance indicates different meaningful correlations between atoms.Therefore, the spatial information is encoded by applying a one-hot encoder to split the scalar distances into b buckets, leadingto a multiple position relation matrix D R ∈ R N × N × b . Taking 3 as an example, we divide neighbors of a into different spatialrelations. Then we adopt a dense layer to obtain the position embedding ppp i j for each pocket-ligand edge e i j . ppp i j = W p D Ri j (4)Where W p ∈ R d × b is the transformation weight matrix. Next, we take the position embedding into the hierarchicalaggregation layer for both edge-level and node-level. Edge-level Aggregation.

Considering the relative position information is attached to a pair of atoms, the key challenge of applying GNN to learningposition information is how to propagate such pairwise information on a molecular graph. To this end, S-MAN ﬁrst employsan attentive aggregation for edges to capture long-range dependencies among atom nodes. In this sense, we introduce thedeﬁnition of the edge neighbors, denoted as N e . If there is a path: a k → a i → a j , the edge e ki = ( a k , a i ) is a neighbor of theedge e i j = ( a i , a j ) , denoted as e ki → e i j , more formally: N e ( e i j ) = { e ki | e ki ∈ E , k (cid:54) = j } (5)As illustrated in Figure 4, the edge embedding eee li j is ﬁrstly updated through the node-to-edge aggregation: eee li j = AGG node → edge ( a i , a j , p i j )= σ ( W lne · [ aaa l − i ⊕ aaa l − j ⊕ ppp i j ] + b lne ) (6) here ⊕ is the concatenation operation over two vectors, W lne is the transformation matrix at the l -th layer to combine nodeembedding and position embedding, b lne is the bias vector, and σ is the activation function for non-linearity. For each edge e i j ,inspired by the GAT [20], the following attentive aggregation over edge neighbors is formulated as: eee li j = AGG edge → edge ( e i j , N e ( e i j ))= ∑ e ki ∈ N e ( e ij ) α lk , i , j W le eee lki (7)where W le is the weight matrix and α lk , i , j is the normalized attention weight of edge neighbor e ki via the softmax function: α lk , i , j = exp ( σ a ( aaa Te , l [ W le eee li j ⊕ W le eee lki ])) ∑ e ti ∈ N e ( e ij ) exp ( σ a ( aaa Te , l [ W le eee li j ⊕ W le eee lti ])) (8)where aaa e , l is the attention parameter for measuring weights of edge neighbors and we use LeakyReLU as the activationfunction. Thanks to the position information injection in equation 6 after node-to-edge aggregation, the later attentive edge-to-edge aggregation can acquire the long-range dependencies adaptively in molecular graph. Node-level Aggregation.

Furthermore, after obtaining the edge embedding eee li j from edge-level aggregation, we apply the node-level aggregation to fusethe position-enhanced information with the capability of discriminating multiple spatial relations among atoms. By meansof aggregating all related edges for each node with the specially designed distance-aware attention, the combination of edgerepresentations involves the necessary spatial and topological structure information. Similar to edge neighbors set N e , wedeﬁne the edge neighbors of a node a i as follows: N eon ( a i ) = { e ki | e ki = ( a k , a i ) ∈ E } (9)We ﬁrst convert the edge embedding and atom node embedding into the hidden representation hhh k , i , e and hhh i , a in the samevector space by performing a linear transformation: hhh lk , i , e = W lh eee lki , hhh li , a = W lh aaa l − i (10)Where W lh is the weight matrix in l -th layer. Then we propose the distance-aware attention to learn the weight amongmulti-relation edges. The importance of the edge e ki for destination atom a i can be formulated as follows: w lki = attn eon ( a i , e ki , p ki )= σ ( aaa Tn , l · [ hhh li , a ⊕ hhh lk , i , e ⊕ W ls ppp ki ]) (11)where aaa n , l is the node attention parameter for measuring weights of edge neighbors, W ls is the weight matrix for positiontransformation and σ is the activation function. Then the softmax function is used for normalization: β lki = exp ( w lki ) ∑ e ki ∈ N eon ( a i ) exp ( w lki ) (12)The updated atom node embedding is calculated by the edge-to-node aggregation. We also develop the position-awareattention to multi-head attention version as GAT for better stability with M independent attention mechanisms: aaa li = AGG edge → node ( a i , N eon ( a i ))= σ (cid:16) M M ∑ m = ∑ e ki ∈ N eon ( a i ) β l , mki hhh l , mk , i , e (cid:17) (13)As shown in Figure 4, we further stack L position-aware Molecule GAT layers to learn more adequate position and structureinformation for drug-target binding afﬁnity prediction, and we use aaa i = aaa Li to represent the ﬁnal embedding of atom a i . .3 Drug-target Binding Afﬁnity Prediction After performing S-MAN layers, we obtain the representations of atoms in pocket-ligand. We can employ a graph poolinglayer for all atoms to get the global pocket-ligand embedding. In our study, the graph-level representation ggg is calculated bysummation: ggg = ∑ a i ∈ V aaa i (14)Then we feed ggg into fully connected layers to predict the drug-target binding afﬁnity score:ˆ y = W o ( σ ( W σ ( W ggg + b )) + b ) + b o ) (15)We use the Mean Square Error (MSE) between the predicted valur ˆ y and the observed binding afﬁnity y as the loss functionto train the model S-MAN over all pocket-ligand complexes in dataset D : L = | D | ∑ ( M , P ) ∈ D ( y − ˆ y ) (16) In this section, we ﬁrst introduce the datasets and experiment settings. Then we compare our proposed S-MAN with othermethods to predict drug-target binding afﬁnity on two PDBbind datasets.

As our proposed model takes advantage of 3D positions of atoms, the experimental dataset is required to provide such spatialinformation. So we conducted experiments using two public released PDBbind datasets (v.2016 and v.2019) to evaluate theeffectiveness of S-MAN and baselines.

PDBbind.

The PDBbind dataset [25] is a well-known dataset for predicting the binding afﬁnity of drug-target complexes, which iscomposed of 3D structures of molecular complexes and the corresponding experimentally determined binding afﬁnitiesexpressed with pK a values. Each dataset has three subsets: general set , reﬁned set and core set . The general set contains allcomplexes with relatively lower quality, while the reﬁned set is a subset of the general set with higher quality, which is used asthe training set in our experiment. The core set is designed as the highest quality benchmark, and it is usually used as a test set.With regard to the most used PBDBind v.2016, there are 4057 complexes in the reﬁned set and 290 complexes in the core set .Besides, we also use the latest released PDBbind v.2019 with 4852 and 285 complexes in these two subsets, which is updatedon the previous v.2016 edition. To comprehensively evaluate the model performance, we use Root Mean Square Error (RMSE) and Mean Absolute Error(MAE) to measure the prediction error. The performance of a model is also quantitatively evaluated by the classic Pearson’scorrelation coefﬁcient (R) and the standard deviation (SD) in regression to measure the linear correlation between predictionsand the experimental constants. As introduced in CASF [26], SD is deﬁned as follows: SD = (cid:118)(cid:117)(cid:117)(cid:116) | D | − | D | ∑ i = [ y i − ( a + b ˆ y i )] (17)where ˆ y i and y i respectively represent the predicted and experimental value of the i -th complex in dataset D , and a and b are the intercept and the slope of the regression line, respectively. We compare our S-MAN model with the following methods to predict the drug-target binding afﬁnity:• LR uses linear regression for drug-target binding afﬁnity prediction. We calculate the inter-molecular interaction featuresintroduced in [8] as the input and predict the afﬁnity scores. ataset Training Validation Testing ( core set )v.2016 3,390 377 290v.2019 4,127 459 285 Table 1.

Statistical complexes in two PDBbind datasets.•

SVR is a variant of support machine vector (SVM) for regression task. We use the same features as LR. Please notethat these strong graph-level features are extracted by domain knowledge with considering the interaction and spatialinformation among atoms, which are time-consuming.•

Pafnucy [13] is a 3D CNN model designed to learn the spatial structure of protein-ligand complexes for drug-targetbinding afﬁnity prediction.•

GraphDTA [14] is an effective graph neural network model, which introduced GNN into DTA prediction. The graphwith atoms as nodes and bonds as edges is constructed to describe drug molecules. It also uses CNN to learn the proteinsequence representation. There are four variants with different GNN models:

Graph-GCN , Graph-GIN , Graph-GAT and

Graph-GCN+GAT .• SPoG-DTA improves the GraphDTA by inputting our spatial-enhanced pocket-ligand graph (SPoG) instead of thedrug molecular graph into the GNN model. We name the four variants as

SPoG-GCN , SPoG-GIN , SPoG-GAT and

SPoG-GCN+GAT .• S-MAN-NoEdge only performs the node-level aggregation on the SPoG. The edge-level aggregation stage is removedand the node embedding is updated from the node neighbors of each atom.•

S-MAN-NoSpAttn replaces the distance-aware attention in our model by general graph attention without spatialinformation while conducting node-level aggregation.

Settings.

We randomly pick 90% from each reﬁned set in PDBbind v.2016 and v.2019 as the training datasets, and the remaining 10%complexes are used for validation. The main statistics of two PDBbind datasets are summarized in Table 1. We optimize modelswith Adam optimizer, where the batch size is ﬁxed at 32. Besides, we construct the spatial-enhanced pocket-ligand graph withthe threshold θ d = e − , the number of attention head M to 4 and the dropoutratio to 0.2. For Pafnucy and GraphDTA models, we input the same 36-dimension atom features as S-MAN. For all baselinemodels, we use default optimal parameter settings as in their original implementations. Features.

For 3D-CNN and GNN models, the atom features used according to [13] include atom type and hybridization, the numbers ofbonds with other heavy-atoms and hetero-atoms, atom properties such as hydrophobic, and partial charge. In total, 18 featuresare used to describe an atom. Considering the heterogeneity in the pocket-ligand graph, we further extend atom features to a36-dimension vector, where the 1st to 18th elements represent ligand atoms and the 19th to 36th elements represent proteinatoms.

We compare our model with the baseline models mentioned above in two PDBbind datasets v.2016 and v.2019. The experimentalresults reported in Table 2 are obtained over ﬁve runs repeatedly, and the mean value is calculated as well as the standarddeviation in parentheses. We ﬁrst evaluate our model with the previous works, and then analyze the effectiveness of the injectedspatial information in our model.

Predictive performance.

As shown in Table 2, S-MAN signiﬁcantly outperforms the baselines in all metrics across the two datasets. More speciﬁcally,Pafnucy achieves relatively poor results, which indicates the limitation of the 3D-CNN model. As we have mentioned in Section2, although 3D-CNN models can learn the spatial information by treating the protein-ligand complexes as images, it’s likelythat the positions of atoms are inﬂuenced by the rotation and translation of the coordinate system. It might make the model

DBbind v.2016 PDBbind v.2019RMSE MAE SD R RMSE MAE SD RLR 1.677 (0.00) 1.355 (0.00) 1.605 (0.00) 0.676 (0.00) 1.693 (0.00) 1.374 (0.00) 1.620 (0.00) 0.667 (0.00)SVR 1.562 (0.00) 1.269 (0.00) 1.496 (0.00) 0.726 (0.00) 1.577 (0.00) 1.282 (0.00) 1.511 (0.00) 0.719 (0.00)Pafnucy 1.601 (0.02) 1.295 (0.02) 1.584 (0.02) 0.686 (0.01) 1.907 (0.08) 1.520 (0.07) 1.711 (0.03) 0.617 (0.02)Graph-GIN 1.655 (0.04) 1.248 (0.04) 1.646 (0.04) 0.654 (0.02) 1.632 (0.03) 1.238 (0.04) 1.623 (0.03) 0.665 (0.01)Graph-GCN 1.661 (0.04) 1.278 (0.03) 1.653 (0.03) 0.650 (0.02) 1.715 (0.04) 1.304 (0.02) 1.698 (0.03) 0.624 (0.02)Graph-GAT 1.776 (0.04) 1.378 (0.03) 1.751 (0.03) 0.593 (0.02) 1.814 (0.01) 1.396 (0.01) 1.786 (0.01) 0.570 (0.01)Graph-GCN+GAT 1.539 (0.02) 1.204 (0.02) 1.537 (0.02) 0.708 (0.01) 1.646 (0.04) 1.292 (0.04) 1.642 (0.04) 0.655 (0.02)SPoG-GIN 1.663 (0.02) 1.266 (0.01) 1.646 (0.02) 0.655 (0.01) 1.713 (0.05) 1.299 (0.04) 1.693 (0.04) 0.627 (0.02)SPoG-GCN 1.702 (0.04) 1.292 (0.03) 1.679 (0.04) 0.636 (0.02) 1.678 (0.02) 1.288 (0.02) 1.670 (0.02) 0.640 (0.01)SPoG-GAT 1.711 (0.02) 1.310 (0.01) 1.694 (0.02) 0.628 (0.01) 1.709 (0.01) 1.304 (0.01) 1.684 (0.00) 0.632 (0.00)SPoG-GCN+GAT 1.526 (0.02) 1.192 (0.02) 1.526 (0.03) 0.713 (0.01) 1.548 (0.03) 1.192 (0.02) 1.536 (0.02) 0.708 (0.01)

S-MAN 1.359 (0.03) 1.093 (0.02) 1.347 (0.03) 0.786 (0.01) 1.469 (0.01) 1.189 (0.02) 1.429 (0.03) 0.753 (0.01)

Table 2.

Experimental results of DTA prediction on PDBbind datasets.

RMSE MAE SD R m e t r i c s c o r e S-MANS-MAN-NoSpAttnS-MAN-NoEdge (a) Results on PDBbind v.2016

RMSE MAE SD R m e t r i c s c o r e S-MANS-MAN-NoSpAttnS-MAN-NoEdge (b) Results on PDBbind v.2019

Figure 5.

Evaluation of S-MAN with its variants.confused when learning binding structures in different complexes. As a result, the 3D-CNN model can only capture restrictedposition information. Beneﬁt from the strong features of the occurrence for atom types within the speciﬁed spatial distance,the performance of SVR is better than LR and Pafnucy. Due to the ability of aggregating information of spatial position andtopological structure, our model has much better performance than the above baselines.For graph neural networks, the GNN models of SPoG-DTA perform better than GraphDTA on the whole, demonstratingthat the richer spatial information is helpful for DTA prediction. What’s more, we observe that the GAT model achieves themost prominent performance improvement. The potential reason is that the attention mechanism helps to ﬁnd out meaningfulneighbors among added spatial relations. However, these GNN models fail to capture the spatial information, and our S-MANoffers the average relative performance gain of 10.9% in RMSE over the best baselines on DPBbind v.2016 dataset.

Inﬂuence of spatial information.

To study the effectiveness of distance-aware attention and edge-level aggregation, we further conduct experiments for thevariants of S-MAN. As illustrated in Figure 5, the results show that removing the edge-level aggregation degrades the model’sperformance, proving that the spatial information carried by the edge is critical to drug-target binding afﬁnity prediction.Moreover, the prediction error of S-MAN-NoEdge is higher than S-MAN-NoSpAttn, it indicates that the edge-level aggregationplays a more signiﬁcant role in our model, which demonstrates the necessity of the hierarchical structure. As S-MAN-NoSpAttnignores the position information of atoms and lacks the ability of identifying multiple spatial relations while executing thenode-level aggregation, it performs worse than S-MAN on both datasets.

Number of spatial buckets.

To explore the impact of the spatial bucket setting parameter b , we conduct the parameter sensitivity experiment on the PDBbindv.2016 dataset by changing the number of spatial buckets. As shown in Figure 6(a), when the parameter b increases from 2 to 4, Number of spatial buckets S c o r e RMSESD (a)

Number of SMAN layers S c o r e RMSESD (b)

Figure 6.

Parameter sensitivity experiment results.there are noticeable improvements on both two metrics, and the performance does not get better since b >

4. This is probablybecause more spatial relation information and position information is available for the model, while too many spatial bucketsmight produce unexpected noises.

Number of S-MAN layers.

We further study the inﬂuence of different numbers of S-MAN layers by varying from 1 to 4. As shown in Figure 6(b), weobserve that the performance of our model gets worse starting from 3 layers. This is because too many layers cause theoverﬁtting of our model on the training set. It indicates that S-MAN can achieve great performance with only one or two layers.

In this paper, we propose a novel distance-aware molecule graph attention network (S-MAN) to predict the drug-targetbinding afﬁnity. We ﬁrst construct a spatial-enhanced pocket-ligand graph (SPoG) to preserve more spatial information andinteractions between drug and protein. Moreover, the well-designed S-MAN adopts a hierarchical attention structure, whichcontains edge-level aggregation and node-level aggregation to capture the unique spatial correlation among atoms. Extensiveexperimental results on two PDBbind datasets show that S-MAN signiﬁcantly outperforms all baselines for DTA prediction.

References Drews, J. Drug discovery: a historical perspective.

Science , 1960–1964 (2000). Cohen, P. Protein kinases—the major drug targets of the twenty-ﬁrst century?

Nat. reviews Drug discovery , 309–315(2002). Noble, M. E., Endicott, J. A. & Johnson, L. N. Protein kinase inhibitors: insights into drug design from structure.

Science , 1800–1805 (2004). Butler, K. T., Davies, D. W., Cartwright, H., Isayev, O. & Walsh, A. Machine learning for molecular and materials science.

Nature , 547–555 (2018). Ekins, S. et al.

Exploiting machine learning for end-to-end drug discovery and development.

Nat. materials , 435 (2019). McGaughey, G. B. et al.

Comparison of topological, shape, and docking methods in virtual screening.

J. chemicalinformation modeling , 1504–1519 (2007). Wang, R., Lu, Y. & Wang, S. Comparative evaluation of 11 scoring functions for molecular docking.

J. medicinal chemistry , 2287–2303 (2003). Ballester, P. J. & Mitchell, J. B. A machine learning approach to predicting protein–ligand binding afﬁnity with applicationsto molecular docking.

Bioinformatics , 1169–1175 (2010). Emig, D. et al.

Drug target prediction and repositioning using an integrated network-based approach.

PLoS One , e60618(2013). Wen, M. et al.

Deep-learning-based drug–target interaction prediction.

J. proteome research , 1401–1409 (2017). Zheng, S., Li, Y., Chen, S., Xu, J. & Yang, Y. Predicting drug–protein interaction using quasi-visual question answeringsystem.

Nat. Mach. Intell. , 134–140 (2020). Öztürk, H., Özgür, A. & Ozkirimli, E. Deepdta: deep drug–target binding afﬁnity prediction.

Bioinformatics , i821–i829(2018). Stepniewska-Dziubinska, M. M., Zielenkiewicz, P. & Siedlecki, P. Development and evaluation of a deep learning modelfor protein–ligand binding afﬁnity prediction.

Bioinformatics , 3666–3674 (2018). Nguyen, T., Le, H., Quinn, T. P., Le, T. & Venkatesh, S. Graphdta: Predicting drug–target binding afﬁnity with graphneural networks. bioRxiv

Lin, X. et al.

Deepgs: Deep representation learning of graphs and sequences for drug-target binding afﬁnity prediction. In (2020).

Gao, K. Y. et al.

Interpretable drug target prediction using deep neural representation. In

Proceedings of the 27thInternational Joint Conference on Artiﬁcial Intelligence , 3371–3377 (2018).

Ballester, P. J., Schreyer, A. & Blundell, T. L. Does a more precise chemical description of protein–ligand complexes leadto more accurate prediction of binding afﬁnity?

J. chemical information modeling , 944–955 (2014). Öztürk, H., Ozkirimli, E. & Özgür, A. Widedta: prediction of drug-target binding afﬁnity. arXiv preprint arXiv:1902.04166 (2019).

Kipf, T. N. & Welling, M. Semi-supervised classiﬁcation with graph convolutional networks. In

International Conferenceon Learning Representations (ICLR) (2017).

Veliˇckovi´c, P. et al.

Graph attention networks. In

International Conference on Learning Representations (2018).

Xu, K., Hu, W., Leskovec, J. & Jegelka, S. How powerful are graph neural networks? In

International Conference onLearning Representations (2019).

Lu, C. et al.

Molecular property prediction: A multilevel quantum interactions modeling perspective. In

Proceedings of theAAAI Conference on Artiﬁcial Intelligence , vol. 33, 1052–1060 (2019).

Klicpera, J., Groß, J. & Günnemann, S. Directional message passing for molecular graphs. In

International Conference onLearning Representations (ICLR) (2020).

Lim, J. et al.

Predicting drug–target interaction using a novel graph neural network with 3d structure-embedded graphrepresentation.

J. chemical information modeling , 3981–3988 (2019). Wang, R., Fang, X., Lu, Y., Yang, C.-Y. & Wang, S. The pdbbind database: methodologies and updates.

J. medicinalchemistry , 4111–4119 (2005). Li, Y., Han, L., Liu, Z. & Wang, R. Comparative assessment of scoring functions on an updated benchmark: 2. evaluationmethods and general results.

J. chemical information modeling , 1717–1736 (2014)., 1717–1736 (2014).