InteractionNet: Modeling and Explaining of Noncovalent Protein-Ligand Interactions with Noncovalent Graph Neural Network and Layer-Wise Relevance Propagation
InteractionNet: Modeling and Explaining of Noncovalent Protein-Ligand Interactions with Noncovalent Graph Neural Network and Layer-Wise Relevance Propagation
Hyeoncheol Cho , Eok Kyun Lee , and Insung S. Choi Department of Chemistry, KAIST, Daejeon 34141, Korea. Email: [email protected], [email protected].
Abstract:
Expanding the scope of graph-based, deep-learning models to noncovalent protein-ligand interactions has earned increasing attention in structure-based drug design. Modeling the protein-ligand interactions with graph neural networks (GNNs) has experienced difficulties in the conversion of protein-ligand complex structures into the graph representation and left questions regarding whether the trained models properly learn the appropriate noncovalent interactions. Here, we proposed a GNN architecture, denoted as InteractionNet, which learns two separated molecular graphs, being covalent and noncovalent, through distinct convolution layers. We also analyzed the InteractionNet model with an explainability technique, i.e., layer-wise relevance propagation, for examination of the chemical relevance of the modelโs predictions. Separation of the covalent and noncovalent convolutional steps made it possible to evaluate the contribution of each step independently and analyze the graph-building strategy for noncovalent interactions. We applied InteractionNet to the prediction of protein-ligand binding affinity and showed that our model successfully predicted the noncovalent interactions in both performance and relevance in chemical interpretation.
Introduction
Deep-learning chemistry is an emerging field in the chemistry discipline, and it has shown remarkable fruition in diverse chemical areas.
It is the representation learning of molecules, without the human-curated heuristics and descriptors, used widely in cheminformatics for decades, which is one of the recent endeavors of deep-learning chemistry. Learning of the representations from actual molecular structures would enable the full utilization of the discriminative power of deep neural networks (DNNs) on the prediction of target properties without prior quantum-chemical calculations.
Compared with the physics-based computational methods for calculating molecular properties, the deep-learning approach offers a fast, but still powerful, option for estimating diverse characteristics of molecules through the data-driven discovery of molecular patterns.
The recent rise of graph neural networks (GNNs) has upscaled deep-learning capability in chemistry with the easy handling of molecules as molecular graphs, which are defined by two sets of vertices and edges.
The molecular graphs contain the structural information on molecules in two-dimensional (2D) space, with atoms as vertices and bonds as edges in the graphs. In the GNN, the neurons in a layer are connected to their graph neighborhoods, and layer stacking generates broader local structures in molecules. Many GNN models have been developed for the prediction of molecular energies, physical properties, protein interactions, and biochemical functions.
The pioneering reports, by Duvenaud et al. and Kearnes et al., showed the effectiveness of GNNs on the prediction of molecular properties compared to other machine-learning methods based on molecular fingerprints, and the GNN architecture has further been refined to message-passing neural networks (MPNNs). Moreover, the expansion of the GNN architecture into the 3D space for modeling the actual molecular structures has recently been explored, and the efficacy of the GNN approach on the problems requiring 3D molecular structures has been proven.
One of the focused fields of deep learning in chemistry is the replacement of the scoring function on the structure-based drug design with data-driven DNN models.
The essence of DNN models for deep-learning scoring compared to the force field energy functions and scoring functions is the appropriate database to learn molecular patterns and their relationship to binding affinity, and they are strongly linked to prediction performance. The PDBbind database is the most widely used dataset, which is a curation of 3D protein-ligand structures obtained from X-ray crystallography and multidimensional NMR techniques with complementary binding affinities, for training DNN models on prediction of the binding constant from a complex structure. Many DNN models were developed based on the PDBbind database and can be classified into two categories: convolutional neural networks (CNNs) with voxelized images and GNNs working on graph representations of complexes. Rapid development of high-performance and deeply-stacked CNN models in computer science has dramatically raised the prediction performance of binding constants through enhanced pattern recognition of the 3D molecular images.
Protein-ligand complexes were transferred into the angstrom-level voxel grid and used for training the CNN models. Meanwhile, the GNN models, which focused on the interpretation of molecular bonding (i.e., covalent bonds) as graph edges, was utilized after the success of CNN models by incorporating noncovalent (NC) interactions as graph edges in molecular graphs, which play significant roles in the programmed formation of 3D molecular structures of biomolecules (e.g., proteins, nucleic acids, and lipid bilayers) and polymers and their dynamics.
In the GNN models, NC connectivity was utilized in conjunction with covalent connectivity for post-refinement of atomic features after the convolution with covalent-bond connectivity. Gaussian decay functions have been used to mimic decreased influences from distant atoms, or multiple kernels for distance bins have been adopted for simulating NC interactions.
These approaches enrich the graphic representation of molecules by adding topological information and acquiring the shape-awareness, which has only been feasible in the CNNs. In addition to the decay simulation, an approximation of the entire atomic contribution to a smaller subset was widely utilized.
Due to the extremely large number of atoms in the protein-ligand complex compared to other molecules in molecular property datasets, training the complex data is challenging for both CNN and GNN models. By limiting the protein atoms into a spatial neighborhood of the ligand molecule, the complex can be greatly reduced into smaller sizes and trained efficiently without losing important interactions. Owing to the aforementioned advanced approaches, the GNN models became a competitive option for developing deep-learning scoring models with a direct interpretation of molecular structures. In this paper, we propose a GNN architecture, denoted as InteractionNet, that directly learns molecular graphs without any physical parameters, wherein the NC interactions are encoded as graphs along with the bonded adjacency that models covalent interactions. We utilize the PDBbind dataset for evaluation of the concept and examine the model performance on predicting the binding constant from a complex structure. Specifically, we divide the convolutional layers in InteractionNet into two, the covalent and NC convolution layers (CV [C] and CV [NC] layers, respectively), and evaluate the significance of NC convolution. There have been reports on the incorporation of NC connectivity in GNN models, but in strict combination with covalent connectivity. Here, we apply the covalent and NC connectivity separately to investigate the importance of each convolution layer, which has not been explored. In extreme cases, only CV [NC] is used, without any CV [C] layers, and compared to other models. Moreover, we investigate the optimal cropping strategy for downsizing the protein-ligand structure and efficient training. Based on the findings, we further investigate the explanations for the predictions of the trained model, i.e., how the trained model predicts for the first time from the given input data in the protein-ligand complexation problem. By performing decomposition-based, layer-wise relevance propagation (LRP) on behalf of explainable AI and visualizing the obtained atomic contribution for the prediction of the protein-ligand complex, we explore the relationship between machine-predicted NC interactions and knowledge-based NC interactions from the molecular structures.
Results and Discussion
InteractionNet Architecture.
For graphic representation of a protein-ligand complex, InteractonNet employs two adjacency matrices for the complex, denoted the covalent and NC adjacency matrices ( A [C] and A [NC] ), similar to the PotentialNet reported by Feinberg et al. A [C] and A [NC] are defined by the combination of molecular graphs for a protein and a ligand but with different connectivity strategies. The covalent adjacency matrix, ฮ [C] , consists of the bond connectivity in the protein and the ligand, and is constructed by a disjoint union of the protein and the ligand graphs, maintaining the bond connectivity only within each molecule. The NC adjacency matrix, ฮ [NC] , defined by a graph having full connectivity between the vertices of the protein and the ligand graphs, contains all the possible edges between the protein and the ligand but not within the same molecule (Figure 1a). Based on the notation used by Feinberg et al., each adjacency matrix can be decomposed into four blocks, A ๐ฟ:๐ฟ , A ๐ฟ:๐ , A ๐:๐ฟ , and A ๐:๐ , that correspond to smaller adjacency matrices encoding the connectivity between ligand-ligand, ligand-protein, protein-ligand, and protein-protein atoms, respectively (Equation 1).
A = [A โฏ A โฎ โฑ โฎA ๐1 โฏ A ๐๐ ] = [A ๐ฟ:๐ฟ A ๐ฟ:๐ A ๐:๐ฟ A ๐:๐ ] (1) where A ๐๐ is whether the node ๐ and ๐ are adjacent, and ๐ is the number of atoms inside the complex. For ฮ [C] , only the A ๐ฟ:๐ฟ and A ๐:๐ blocks are filled with the existence of a covalent bond between atoms, and the remaining A ๐ฟ:๐ and A ๐:๐ฟ blocks are filled with 0. In the case of ฮ [NC] , A ๐ฟ:๐ , and A ๐:๐ฟ , blocks are filled with 1, implying all possible NC interactions between atoms, and the rest are filled with 0. These adjacency strategies assume that there is no covalent bond between the protein and the ligand, and NC interactions within the same molecule are ignored. Obtained adjacency matrices are used as-is through the neural network without training the adjacency matrix itself or requiring modification during the propagation. InteractionNet is built to utilize the A [C] and A [NC], consecutively, for the end-to-end prediction of dissociation constants from the molecular structures (Figure 1b). It consists of five functional layers: graph-embedding layers, CV [C] layers, CV [NC] layers, a global pooling (GP) layer, and fully-connected (FC) layers. The graph-embedding layers update the atomic features through fully-connected neural networks that mix the features assigned for each atom. The CV [C] and CV [NC] layers receive the graph embedding and combine the graph adjacency with atomic features for local aggregation of the information. Compared with the PotentialNet, we separate the spatial convolution layers utilizing the two adjacency matrices, A [C] and A [NC] , one-by-one at each layer, whereas they are combined and utilized in a single layer in the PotentialNet. By applying the two adjacency matrices for the spatial convolution separately, we simulate the importance of each step on the prediction of the dissociation constant independently. For each convolution step, InteractionNet utilizes the corresponding adjacency matrix and updates the representation additively by residual connections. After the convolution steps, the GP layer aggregates the atomic features distributed across the atoms in a permutation-invariant way and generates a molecular vector. Sum pooling was utilized for the GP mechanism. With the obtained molecular vector, the FC layers transform the representation into the dissociation constant of a protein-ligand complex.
Figure 1. (a) Schematic illustrations for modeling NC interactions within a graphic representation, and (b) architecture of InteractionNet for predicting the dissociation constant from the covalent and NC graphs. (a) Structure of the protein-ligand complex converted into two graph representations, encoded by covalent ( A [C] ) and NC adjacency matrices ( A [NC] ), defined by covalent bond connectivity and all possible edges between the protein and the ligand, respectively. (b) InteractionNet learns the two aforementioned adjacency matrices, A [C] and A [NC] , and predicts the dissociation constant of the complex through a graphic neural network consisting of five functional layers. Model Training.
We examined the efficacy of the CV [C] and CV [NC] layers by three variants of InteractionNet with different compositions of CV layers. Four other functional layers of InteractionNet were used with the same number of layers and composition across the variants. By incorporating only one type of the CV layers for InteractionNet, we built InteractionNet [C], utilizing only CV [C] layers, and InteractionNet [NC], utilizing only CV [NC] layers. The variant that incorporated both CV layers sequentially was coined as InteractionNet [C-NC] . In the chemistsโ points of view, InteractionNet [C] focused on the covalent bonds within each ligand and protein molecule for prediction, InteractionNet [NC] did this on NC interactions between the ligand and the protein, and InteractionNet [C-NC] observed covalent bonds first and then used the generated information for the secondary refinement through NC interactions. We chose the dissociation constant of the protein-ligand complex ( K d ) as our prediction target because the protein-ligand binding is governed primarily by the NC interactions, not covalent bonds, which is important in investigating the efficacy of the proposed architecture. We conducted a 20-fold-cross-validated experiment on the refined set of the PDBbind v2018 dataset, consisting of 4,186 complexes and their experimental K d values. In the data preprocessing for model training, we cropped the protein structure for faster training and less memory consumption. The binding pockets of the proteins in the refined PDBbind set contained a maximum of 418 atoms, which was almost 16 times larger than the ligands that had only 26 atoms at maximum. We thought that the interactions between a protein and a ligand could be simulated with a smaller subset of atoms in the protein because the number of atoms that participated in the protein-ligand binding is much less than the maximum value. The appropriate cropping strategy, without any loss in performance, is also highly important for efficient training, considering the exponential increase in the memory consumption of the training data. We utilized the spatial atom filtering for simplification of protein structures, which excluded the atoms of a protein distant from a ligand by the range cutoff. In detail, the shortest distance of a protein atom to the ligand atoms was measured, and the protein atom was excluded if the distance exceeded the predefined range cutoff. By spatial cropping with the range cutoff, we obtained a subset of the protein structures, similar to the shape of the van der Waals surface of the ligand but with a much larger radius, and used the subset for the generation of the molecular graphs.
Figure 2.
Influence of the protein cutoff range from 3 to 6 ร on (a) the average number of atoms included in a complex, (b) the size of the training data, (c) the root-mean-squared-error for the predictions from the trained model, and (d) the average single-fold training time. Error bars indicate standard deviations for each measurement. ** p < 0.005. For investigating the influence of the cutoff applied to crop the protein structure into the ligand neighborhoods, we compared the averaged model performance and the training time using InteractionNet [C-NC] by changing the cutoff with 1-ร increment. The number of atoms included in the cropped complex increased linearly with respect to the cutoff, while the data size for the training dataset increased exponentially (Figure 2a-b). The averaged performance was saturated from 4 ร , confirmed by the one-way analysis of variance (ANOVA) with the posthoc Tukey HSD test (Figure 2c). The corresponding training time increased dramatically as the cutoff increased (Figure 2d). Based on the observation, the 5-ร cutoff was considered the most appropriate for our system and used for further investigations.
Prediction of Dissociation Constants.
The root-mean-square error (RMSE) results from the cross-validation experiments, based on the 5-ร filtering of protein structures, confirmed that the CV [NC] layers played a significant role in the K d prediction from the molecular graphs (Table 1). InteractionNet [C-NC] and InteractionNet [NC] outperformed InteractionNet [C] , regardless of the number of CV layers. For example, the RMSE values for InteractionNet [C-NC] and InteractionNet [C] were 1.321 and 1.379, respectively, showing a 4% improvement by incorporating CV [NC] layers ( p < 0.005). The performance of InteractionNet [C-NC] was measured to be slightly higher than InteractionNet [NC] , but the difference was not significant in statistical analysis ( p = 0.450). These results indicated that the interactions between a protein and a ligand could be simulated accurately, even with a single CV [NC] layer, without any help from previous covalent-refinement steps. Table 1.
Twenty-fold cross-validation results for InteractionNet on the refined set of the PDBbind v2018. Root-mean-square-errors were measured for each trial and averaged. The best results were highlighted in boldface.
Model Train Validation Test InteractionNet [C] [NC] [C-NC]
For visualization of the individual predictions from the test dataset, we selected the most similar cross-validation trial in performance to the average and obtained a scatterplot and an error histogram between the predicted and experimental values. The scatterplot for the predicted K d values revealed a high correlation with the experimental K d in a linear relationship (Figure 3a), and the error distribution showed a Gaussian-like, zero-centered shape (Figure 3b). It is to be noted that 20 cross-validation trials showed similar trends in the scatterplot and the error histogram, but had small differences in pattern (Figure S1). Figure 3. (a) Scatterplot and (b) error distribution of predicted and experimental K d values for 419 complexes included in the test set. (a) The scatterplot for predicted versus experimental K d is depicted with the trend line (a solid line). (b) The error histogram (orange) and distribution (black) for predictions from the test set. The most similar trial in performance to the average was selected for depicting the graphs in the cross-validation trials. Layer-Wise Relevance Propagation (LRP).
The explainability techniques interpret the trained model or their predictions into explanations in human terms, which can be assessed by knowledge-based analysis. By analyzing the system with explainability techniques, the models that fail to learn appropriate knowledge to perform predictions based on valid information and fall into the โClever Hansโ decision made by fragmentary knowledge could be identified. To explore the explainability of the trained InteractionNet model on the K d prediction, we conducted the post hoc explanation on individual predictions by the LRP. The LRP calculates the relevance for every neuron by reversely propagating, through the network, from the predicted output to the input level, and the relevance represents the quantitative contribution of a given neuron to the prediction. We used three LRP rules, LRP-0, LRP-ฮต, and LRP-ฮณ, sequentially from the output layer to the input layer for production of the relevance for the neurons (Equations 2-4)
LRPโ 0: ๐ ๐ = โ ๐ ๐ ๐ค ๐๐ โ ๐ ๐ ๐ค ๐๐0,๐ ๐ ๐ ๐ (2) LRPโ ฯต: ๐ ๐ = โ ๐ ๐ ๐ค ๐๐ ๐+โ ๐ ๐ ๐ค ๐๐0,๐ ๐ ๐ ๐ (3) LRPโ ฮณ: ๐ ๐ = โ ๐ ๐ (๐ค ๐๐ +๐พ๐ค ๐๐+ )โ ๐ ๐ (๐ค ๐๐ +๐พ๐ค ๐๐+ ) ๐ ๐ ๐ (4) where ๐ and ๐ represent neurons at two consecutive layers, ๐ is the relevance, ๐ denotes lower layer activations, ๐ค + is a positive weight, and ฯต and ฮณ are the parameters used in each LRP rule. Once we obtained the relevance for the atomic features in the K d prediction, it was converted to the atomic contributions by summation of relevance for individual features of the same atom. Finally, we compared the relevance with the knowledge-based analysis data from the information on hydrogen bonds and hydrophobic contacts within the complex (Figure 4, see methods for detail). Three protein-ligand complexes from the test set, PDB codes 1KAV, 3F7H, and 4IVB, were sampled and analyzed (Figures 5-7). Figure 4.
Schematic illustration of atomic contributions obtained by applying layer-wise relevance propagation (LRP) on InteractionNet and its comparison with knowledge-based protein-ligand interactions.
PDB 1KAV: Human tyrosine phosphatase 1B and a phosphotyrosine-mimetic inhibitor (ChEMBL1161222).
As seen in the 3D structure, half of the ligand is surrounded by the protein pocket with substantial hydrogen bonding on one of the phosphate groups. The half with the other phosphate is exposed freely to the exterior (Figure 5a,b). On the knowledge-based protein-ligand interaction analysis, 6 hydrogen bonds were observed on the phosphate group from Ser216, Ile219, Gly220, and Arg221, and 5 hydrophobic contacts were expected on Tyr46, Phe182, and Ala217 with the aliphatic chain in the middle part of the ligand structure. ChEMBL1161222 is structurally symmetric, and it is highly important to examine whether InteractionNet properly distinguishes the two phosphate groups in different surroundings. The heat map for the obtained atomic contributions of ChEMBL1161222 from the trained InteractionNet, arguably, showed a high correlation between human understanding and the machine-provided explanation (Figure 5c). InteractionNet focused on only one phosphate group, which resided inside the protein pocket, and predicted its high contribution to the increase in binding affinity. The influence of hydrophobic contacts was not observed in the heat map of 1KAV.
Figure 5. (a) Three-dimensional structure of the protein-ligand complex, 1KAV. The protein is depicted in a cartoon (green), and the ligand is depicted in color-coded ball-and-stick. Atom colors: gray (carbon), red (oxygen), orange (phosphorus), and light green (fluorine). (b) Knowledge-based estimation of protein-ligand interactions. Hydrogen bonds are depicted in red dashed lines, and hydrophobic contacts are depicted in gray dashed lines. (c) Heat map for the atomic contributions on the K d prediction, obtained from the LRP. The contributions are illustrated with color intensity of red (positive influence), white (zero influence), and blue (negative influence) colors. PDB 3F7H: Baculoviral IAP repeat-containing protein 7 with an azabicyclooctane-based antagonist (ChEMBL479725).
ChEMBL479725 can be divided into two parts by azabicyclooctane, the amide chain with one secondary amine, and the diphenylacetamide group. On the knowledge-based analysis, the amine and amide parts bound to 3F7H by four hydrogen bonds with their carboxyl and amide groups, and the diphenylacetamide group did not have interactions, except for one hydrophobic contact (Figure 6a,b). When compared to the machine-provided heat map, InteractionNet showed a highly positive focus on the terminal amine that participated in two hydrogen bonds (Asp138 and Glu143) and a little positive focus on the azabicyclooctane ring that participated in two hydrophobic contacts with the indole (Try147) and isobutyl (Leu131) groups. However, the diphenylacetamide group was predicted to slightly decrease the K d value (Figure 6c). The amide groups in the azabicyclooctane ring and the diphenylacetamide group had a negligible contribution to K D , which concurred with the knowledge-based observation. Figure 6. (a) Three-dimensional structure of the protein-ligand complex, 3F7H. The protein is depicted in a cartoon (green), and the ligand is depicted in color-coded ball-and-stick. Atom colors: gray (carbon), red (oxygen), and blue (nitrogen). (b) Knowledge-based estimation of protein-ligand interactions. Hydrogen bonds are depicted in red dashed lines, and hydrophobic contacts are depicted in gray dashed lines. (c) Heat map for the atomic contributions on the prediction of K d , obtained from the LRP. The contributions are illustrated with color intensity of red (positive influence), white (zero influence), and blue (negative influence) colors. PDB 4IVB: Tyrosine-protein kinase JAK1 with an imidazopyrrolopyridine-based inhibitor (ChEMBL2386633).
In the 4IVB complex, ChEMBL2386633 resided in between the two lobes of JAK1 and was expected to have four hydrogen bonds, i.e., two in the imidazopyrrolopyridine group and two in the hydroxyl group and three hydrophobic contacts with JAK1 (Figure 7a,b). The ChEMBL2386633 heat map showed a similar contribution pattern, predicting a highly positive contribution from the nitrogen atoms of imidazopyrrolopyridine and one oxygen atom. The most-focused atom was the oxygen of the hydroxyl group, which participated in two hydrogen bonds with Ser963 and Glu966 of JAK1. The four nitrogen atoms in imidazopyrrolopyridine were given a high contribution, but only two nitrogen atoms participated in the hydrogen bond. The prediction on the cyanocyclohexyl group was not influential to the K d , which corresponded with the 3D structure showing the exposure of the group to the exterior. Figure 7. (a) Three-dimensional structure of the protein-ligand complex, 4IVB. The protein is depicted in a cartoon (green), and the ligand is depicted in color-coded ball-and-stick. Atom colors: gray (carbon), red (oxygen), and blue (nitrogen). (b) Knowledge-based estimation of protein-ligand interactions. Hydrogen bonds are depicted in red dashed lines, and hydrophobic contacts are depicted in gray dashed lines. (c) Heat map for the atomic contributions on the prediction of K d , obtained from the LRP. The contributions are illustrated with color intensity of red (positive influence), white (zero influence), and blue (negative influence) colors. Conclusions
In conclusion, we presented a graph neural network (GNN) that modeled the noncovalent (NC) interactions and discussed the in-depth analysis of the model combined with the explainability technique for understanding deep-learning prediction. In the graph-based deep-learning models, there has been less attention to the NC interactions compared with the bonded interactions because of the ambiguity of NC connectivity and its effectiveness over the traditional covalent-bond-based strategies. InteractionNet, presented herein, showed satisfactory predictive-ability for predicting the dissociation constant with RMSE of 1.321 on the PDBbind v2018 dataset. The NC convolution layers enhanced InteractionNetโs prediction accuracy, even without the utilization of the traditional bonded connectivity. We further demonstrated that InteractionNet successfully captured the important NC interactions between a protein and a ligand from a given complex through posthoc LRP analysis. The visualization of the atomic contributions showed a strong correlation with the actual hydrogen bonds in the complex. In the case of the ligand that had multiple hydrogen-bond donors and acceptors, the positive atomic contributions were observed only on the atoms participating in the actual hydrogen bonds. We believe that our model would widen the applicable tasks of the chemical, deep-learning models to the problems beyond the bonded interactions within a single molecule and also provide a meaningful explanation for the prediction, enabling the real-world applications that require prediction evidence and reliability.
Methods
Dataset.
We employed the PDBbind v2018 dataset for the evaluation target of our InteractionNet models.
We used the refined set from the provided dataset, consisting of 4,462 protein-ligand complexes with their experimentally measured K d values. Initially, all protein-ligand data were loaded by RDKit 2019.09.2 and Openbabel 3.0.0 , and inspected for improper conformation. During the inspection process, the molecules that failed for loading or contained atomic collisions (interatomic distance below 1 ร ) were excluded from the dataset. After inspection, 4,186 protein-ligand complexes were obtained. The protein structure was cropped by retrieving the atoms of a protein within the range cutoff (3, 4, 5, or 6 ร ), and the size of the protein-ligand complex structure was reduced for faster training. Only heavy atoms were considered in the entire preparation. Atomic features for building the feature matrix are listed in Table S1. In each cross-validation experiment, the refined set was randomly split into a training set, a validation set, and a test set, on an 8:1:1 ratio. Twenty results were obtained through 20-fold cross-validation, and the averaged results were reported. Network Training and Evaluation.
All models were implemented by using TensorFlow 2.0.0 on Python 3.6.9. The training was controlled by learning-rate scheduling, early-stopping techniques, and gradient norm scaling. The learning rate was initially set to 0.00015 and lessened by a factor of 0.75 when the validation loss did not decrease within the previous 200 epochs, and the termination proceeded when the loss stopped decreasing for the previous 400 epochs. To avoid gradient exploding, a clipping parameter of 0.5 was used for gradient norm scaling. For the loss function, mean-squared-error (MSE) was used and optimized by the Adam optimizer. The list of hyperparameters explored is described in Table S2. All experiments were conducted on an NVIDIA GTX 1080Ti GPU, an NVIDIA RTX 2080Ti GPU, or an NVIDIA RTX Titan GPU.
Layer-Wise Relevance Propagation (LRP).
We performed the LRP as a post-modeling explainability method. Three LRP rules, LRP-0, LR
P-ฮต, and LRP-ฮณ, were used for the calculation of relevance on each layer from the trained model. We adopted the LRP-0 for the output layer, LRP- ฮต for the FC layers, and the LRP-ฮณ for the CV [C] and CV [NC] layers, based on the guideline described elsewhere. Obtained relevance for the atomic feature was reduced to the atomic contribution by summation across features. The graph-embedding layers were omitted for the relevance calculation, because the graph-embedding layers only redistributed the relevance between features, not between atoms, resulting in the same atomic contribution before and after redistribution. For the parameters ฮต and ฮณ, 0.25 was used for all LRP- ฮต, and
100 was used for all LRP-ฮณ layers. The cross-validation trial that was most similar to the average in root-mean-squared-error (RMSE) was used for LRP analysis, and the LRP examples were chosen from the test set of the trial, which were predicted accurately, for comparison with knowledge-based analysis. Three-dimensional visualization of the molecular structure was obtained by Mol*, and the expected hydrogen bonds and hydrophobic contacts were determined by the rules RCSB PDB use. Acknowledgments
This work was supported by the KAIST-funded AI Research Program for 2019.
Author contributions
H.C., E.K.L., and I.S.C. developed the concept, and H.C. constructed the deep-learning architectures and performed the experiments. H.C. wrote the manuscript, and E.K.L. and I.S.C. supervised the work and reviewed the manuscript.
Competing interests
The authors declare no competing interests.
Additional information
Correspondence and requests for materials should be addressed to E.K.L. or I.S.C.
References Butler, K. T. et al . Machine learning for molecular and materials science.
Nature , 547โ555 (2018). 2.
Mater, A. C. & Coote, M. L. Deep learning in chemistry.
J. Chem. Inf. Model. , 2545โ2559 (2019). 3. Peiretti, F. & Brunel, J. M. Artificial intelligence: The future for organic chemistry?
ACS Omega , 13263โ13266 (2018). 4. Sanchez-Lengeling, B. & Aspuru-Guzik, A. Inverse molecular design using machine learning: Generative models for matter engineering.
Science , 360โ365 (2018). 5.
Hamilton, W. L., Ying, R. & Leskovec, J. Representation learning on graphs: Methods and applications. Preprint at https://arxiv.org/abs/1709.05584 (2017). 6.
Faber, F. A. et al . Prediction errors of molecular machine learning models lower than hybrid DFT error.
J. Chem. Theory Comput. , 5255-5264 (2017). 7. Gilmer, J. et al . Neural message passing for quantum chemistry. In
Proceedings of the 34th International Conference on Machine Learning-Volume 70 , 1263โ1272 (2017). 8.
Schรผtt, K. T. et al . Quantum-chemical insights from deep tensor neural networks.
Nat. Commun. , 13890; 10.1038/ncomms13890 (2017). 9. Zhou, J. et al . Graph neural networks: A review of methods and applications. Preprint at https://arxiv.org/abs/1812.08434 (2018) 10.
Bonchev, D. & Rouvray, D. H. Chemical Graph Theory: Introduction and Fundamentals (Abacus Press, 1991).
Duvenaud, D. K. et al . Convolutional networks on graphs for learning molecular fingerprints. In
Advances in neural information processing systems , 2224โ2232 (2015). 12.
Coley, C. W. et al . Convolutional embedding of attributed molecular graphs for physical property prediction.
J. Chem. Inf. Model . , 1757โ1772 (2017). 13. Feinberg, E. N. et al. PotentialNet for molecular property prediction.
ACS Cent. Sci . , 1520โ1530 (2018). 14. Lim, J. et al . Predicting drugโtarget interaction using a novel graph neural network with 3D structure-embedded graph representation.
J. Chem. Inf. Model. , 3981โ3988 (2019). 15. Kearnes, S. et al . Molecular graph convolutions: moving beyond fingerprints.
J. Comput. Aided Mol. Des . , 595โ608 (2016). 16. Altae-Tran, H., Ramsundar, B., Pappu, A. S. & Pande, V. Low data drug discovery with one-shot learning.
ACS Cent. Sci. , 283โ293 (2018). 17. Cho, H. & Choi, I. S. Enhanced Deep-learning prediction of molecular properties via augmentation of bond topology.
ChemMedChem , 1604โ1609 (2019). 18. Cang, Z. & Wei, G. TopologyNet: Topology based deep convolutional and multi-task neural networks for biomolecular property predictions.
PLoS Comput. Biol. , e1005690; 10.1371/journal.pcbi.1005690 (2017) 19. Jimรฉnez, J., Doerr, S., Martรญnez-Rosell, G., Rose, A. S. & De Fabritiis G. DeepSite: protein-binding site predictor using 3D-convolutional neural networks.
Bioinformatics , 3036โ3042 (2017). 20. Ragoza, M., Hochuli, J., Idrobo, E., Sunseri, J. & Koes, D. R. Proteinโligand scoring with convolutional neural networks.
J. Chem. Inf. Model. , 942โ957 (2017). 21. Jimรฉnez, J., ล kaliฤ, M., Martรญnez-Rosell, G. & De Fabritiis G. K
DEEP : Protein-ligand absolute binding affinity prediction via 3D-convolutional neural networks.
J. Chem. Inf. Model. , 287โ296 (2018). 22. Zheng, L., Fan, J. & Mu, Y. OnionNet: a multiple-layer intermolecular-contact-based convolutional neural network for protein-ligand binding affinity prediction.
ACS Omega , 15956โ15965 (2019). 23. Wang, R., Fang, X., Lu, Y., Yang, C.-Y. & Wang, S. The PDBbind Database: methodologies and updates.
J. Med. Chem. , 4111โ4119 (2005). 24. Liu, Z. et al . Forging the basis for developing protein-ligand interaction scoring functions.
Acc. Chem. Res. , 302โ309 (2017). 25. He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. In
Proceedings of the IEEE conference on computer vision and pattern recognition , 770โ778 (2016). 26.
Huang, G., Liu, Z., van der Maaten, L. & Weinberger, K. Q. Densely connected convolutional networks. In
Proceedings of the IEEE conference on computer vision and pattern recognition , 4700โ4708 (2017). 27.
Stone, A. J. Intermolecular potentials.
Science , 787โ789 (2008). 28.
Huang, N., Kalyanaraman, C., Bernacki, K. & Jacobson, M. P. Molecular mechanics methods for predicting proteinโligand binding.
Phys. Chem. Chem. Phys. , 5166โ5177 (2006). 29. DiStasio, R. A. Jr., von Lilienfeld, O. A. & Tkatchenko, A. Collective many-body van der
Waals interactions in molecular systems.
Proc. Natl. Acad. Sci.
USA , 14791โ14795 (2012). 30.
Lubbers, N., Smith, J. S. & Barros, K. Hierarchical modeling of molecular energies using a deep neural network.
J. Chem. Phys. , 241715; 10.1063/1.5011181 (2018). 31.
Bach, S. et al . On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation.
PloS one , e0130140; 10.1371/journal.pone.0130140 (2015). 32. Montavon, G. et al . Layer-wise relevance propagation: An overview in
Explainable AI: Interpreting, explaining and visualizing deep learning
Adadi, A. & Berrada, M. Peeking inside the black-box: A survey on explainable artificial intelligence (XAI).
IEEE Access , 52138-52160; 10.1109/access.2018.2870052 (2018). 34. Baldassarre, F. & Azizpour, H. Explainability techniques for graph convolutional networks. Preprint at https://arxiv.org/abs/1905.13686 (2019). 35.
Lapuschkin, S. et al . Unmasking Clever Hans predictors and assessing what machines really learn.
Nat. Commun. , 1096; 10.1038/s41467-019-08987-4 (2019). 36. Open Babel: The Open Source Chemistry Toolbox. http://openbabel.org/wiki/Main_Page (2019). 38.
Kingma, D. P. & Ba, J. Adam: A method for stochastic optimization. Preprint at https://arxiv.org/abs/1412.6980 (2014).
Sehnal, D., Rose, A. S., Kovca, J., Burley, S. K. & Velankar, S. Mol*: Towards a common library and tools for web molecular graphics. In
Proceedings of the Workshop on Molecular Graphics and Visual Analysis of Molecular Data , 29โ33 (2018). 41.
Sticke, D. F., Presta, L. G., Dill, K. A. & Rose, G. D. Hydrogen bonding in globular proteins.
J. Mol. Biol. , 1143โ1159 (1992). 43.
Zhou, P., Tian, F., Lv, F. & Shang, Z. Geometric characteristics of hydrogen bonds involving sulfur atoms in proteins.
Proteins. , 151โ163 (2009). Freitas, R. F. D. & Schapira, M. A systematic analysis of atomic proteinโligand interactions in the PDB.
MedChemComm , 1970โ1981 (2017). SUPPLIMENTARY INFORMATION
InteractionNet: Modeling and Explaining of Noncovalent Protein-Ligand Interactions with Noncovalent Graph Neural Network and Layer-Wise Relevance Propagation
Hyeoncheol Cho , Eok Kyun Lee , and Insung S. Choi Department of Chemistry, KAIST, Daejeon 34141, Korea. Email: [email protected], [email protected].
CONTENTS โข Experimental Section . โข
Table S1.
Atom features used in the graph representation of molecules. โข
Table S2.
Hyperparameters explored for InteractionNet. โข
Figure S1. (Left) Scatterplots and (right) error distributions of predicted dissociation constants and experimental values included in the test set across the best, averaged, and worst cross-validation trials.
Table S1. Atom features used in the graph representation of molecules.
Feature Description Type Size Atom type atom type one-hot 24 Atomic number atomic number integer 1 Degree the number of heavy atom neighbors (0 to 6) one-hot 7 Number of hydrogens the number of neighboring hydrogens (0 to 4) one-hot 5 Implicit valence the number of implicit hydrogens (0 to 6) one-hot 7 Hybridization sp , sp , sp , sp d , or sp d . one-hot 5 Formal charge atomic formal charge (-3 to +3) one-hot 7 Ring size whether this atom belongs to a ring (ring size: 3 to 8) binary 6 Aromaticity whether this atom is part of an aromatic system. binary 1 Acid/base whether this atom is acidic or basic binary 2 Hydrogen bonding whether this atom is a hydrogen bond donor or acceptor binary 2 Total 67 Table S2. Hyperparameters explored for InteractionNet.
Group Hyperparameter Size Graph Embedding number of output units 128, 256, 512 number of embedding layers 1, 2 Graph Convolution number of output units 128, 256, 512 number of convolution layers each 0, 1, 2, 3 Fully connected number of output units 128, 256, 512 number of fully-connected layers 2, 3 l2 regularization 0.0025, 0.005, 0.01, 0.02, 0.04 Training batch size 32 initial learning rate 0.00015 patience 100, 200, 400 loss MSE gradient descent method Adam
Figure S1. (Left) Scatterplots and (right) error distributions of predicted dissociation constants and experimental values, included in the test set across the best, averaged, and worst cross-validation trials. (Left) The scatterplot for predicted versus experimental constants is depicted with the solid trend line. (Right) The histogram (orange) and distribution (black)(Left) Scatterplots and (right) error distributions of predicted dissociation constants and experimental values, included in the test set across the best, averaged, and worst cross-validation trials. (Left) The scatterplot for predicted versus experimental constants is depicted with the solid trend line. (Right) The histogram (orange) and distribution (black)