Improved Protein-ligand Binding Affinity Prediction with Structure-Based Deep Fusion Inference
Derek Jones, Hyojin Kim, Xiaohua Zhang, Adam Zemla, Garrett Stevenson, William D. Bennett, Dan Kirshner, Sergio Wong, Felice Lightstone, Jonathan E. Allen
II MPROVED P ROTEIN - LIGAND B INDING A FFINITY P REDICTIONWITH S TRUCTURE -B ASED D EEP F USION I NFERENCE
A P
REPRINT
Derek Jones †∗ Global Security Computing Applications DivisionLawrence Livermore National LaboratoryLivermore, CA
Hyojin Kim † Center for Applied Scientific ComputingLawrence Livermore National LaboratoryLivermore, CA
Xiaohua Zhang
Biosciences and Biotechnology DivisionLawrence Livermore National LaboratoryLivermore, CA
Adam Zemla
Biosciences and Biotechnology DivisionLawrence Livermore National LaboratoryLivermore, CA
Garrett Stevenson
Computational Engineering DivisionLawrence Livermore National LaboratoryLivermore, CA
William D. Bennett
Biosciences and Biotechnology DivisionLawrence Livermore National LaboratoryLivermore, CA
Dan Kirshner
Biosciences and Biotechnology DivisionLawrence Livermore National LaboratoryLivermore, CA
Sergio Wong
Biosciences and Biotechnology DivisionLawrence Livermore National LaboratoryLivermore, CA
Felice Lightstone
Biosciences and Biotechnology DivisionLawrence Livermore National LaboratoryLivermore, CA
Jonathan E. Allen
Global Security Computing Applications DivisionLawrence Livermore National LaboratoryLivermore, CAMay 19, 2020Predicting accurate protein-ligand binding affinity is important in drug discovery but remains a challenge even withcomputationally expensive biophysics-based energy scoring methods and state-of-the-art deep learning approaches.Despite the recent advances in the deep convolutional and graph neural network based approaches, the model perfor-mance depends on the input data representation and suffers from distinct limitations. It is natural to combine com-plementary features and their inference from the individual models for better predictions. We present fusion modelsto benefit from different feature representations of two neural network models to improve the binding affinity pre-diction. We demonstrate effectiveness of the proposed approach by performing experiments with the PDBBind 2016dataset and its docking pose complexes. The results show that the proposed approach improves the overall predic-tion compared to the individual neural network models with greater computational efficiency than related biophysicsbased energy scoring functions. We also discuss the benefit of the proposed fusion inference with several examplecomplexes. The software is made available as open source at https://github.com/llnl/fast . ∗ † denotes authors contributed equally a r X i v : . [ q - b i o . B M ] M a y PREPRINT - M AY
19, 2020
Predicting accurate binding affinity between a small molecule and target protein is one of the fundamental challengesin drug development. Recently deep learning models have been proposed as an alternative to traditional physics-based free energy scoring functions. The benefit of the deep learning approach is in learning binding interaction rulesdirectly from an atomic representation without relying on hand curated features that may not capture the mechanismof binding (Ballester and Mitchell 2010; Ain et al. et al. et al. et al. et al. et al. et al. et al.
PREPRINT - M AY
19, 2020
The PDBBind database, a curated subset of the Protein Data Bank (PDB) (wwPDB consortium 2019) and initiallydeveloped for use in molecular dynamics pipelines, is a popular choice for the development of machine learning basedscoring functions (Feinberg et al. et al. general and refined ) based upon criteria that consider the nature of the complex (e.g. complexes that have ligands with molecularweight above 1000 Da are excluded from refined), the quality of the binding data (e.g. complexes with IC but no k i or k d measurement are not included in refined) and the quality of the complex structure (e.g. resolution of crystalstructure must be better than 2.5 ˚A). From the refined set, a core set is compiled to provide a representative set forvalidation using a clustering protocol.The 2016 edition of PDBBind used in this study, consists of 13,308 protein-ligand binding complexes in general, 4,057complexes in refined, and 290 complexes in core. An example input is shown in Figure 1.All docking complex data is generated using the in-house developed ConveyorLC toolchain (Zhang et al. (cid:15) = 4 ) is used in the MM/GBSA rescoring since previous studies demonstrateremarkable improvement in the pose ranking (Sun et al. et al. Before extracting the respective three-dimensional representations for each deep learning model, a common pre-processing protocol was applied to the binding complex structures provided in Protein Data Bank (.pdb) format bythe PDBBind database. The process closely mirrors Stepniewska-Dziubinska et al. et al. et al. et al. core set to simulate realisticconditions of evaluating new docking poses. The results of this protocol produces a Tripos Mol2 (.mol2) file for eachprotein pocket.
A common atomic representation based on that of Stepniewska-Dziubinska et al. • Element type: one-hot encoding of B , C , N , O , P , S , Se , halogen or metal • Atom hybridization (1, 2, or 3) • Number of heavy atom bonds (i.e. heavy valence) • Number of bonds with other heteroatoms • Structural properties: bit vector (1 where present) encoding of hydrophobic , aromatic , acceptor , donor , ring • Partial charge • Molecule type to indicate protein atom versus ligand atom (-1 for protein, 1 for ligand) • Van der Waals radiusThe OpenBabel cheminformatics tool (version 2.4.1) (O’Boyle et al.
PREPRINT - M AY
19, 2020Figure 2: The proposed mid-level fusion model together with 3D-CNN and SG-CNN.
The refined set was subtracted from the general set, and the core set was subtracted from the refined set such that thereare no overlaps between the three subsets. We hold out the core set to be used as testing data, keeping the remaininggeneral and refined complexes as training data. Due to the relatively small size of our datasets as compared to those inother domains such as computer vision (Deng et al. et al. et al. et al. et al. N × N × N × C where N is the voxel grid size in each axis (48 in our experiment) and C is the number of atomic features describedin Subsection 2.1.2 (19 in our experiment). The volume size in each dimension is approximately ˚A where eachvoxel size is ˚A, which is sufficient to cover the entire pocket region while minimizing the collisions between atoms.Each atom is assigned to at least one voxel or more, depending on its Van der Waals radius or the user-defined size.In the case of the collisions between atoms, we apply element-wise addition to the atom features. Once all atoms arevoxelized, Gaussian blur with σ = 1 is applied in order to populate the atom features into neighboring voxels, similarto Kuzminykh et al. Top ). The residual short connection proposed in ResNet He et al. et al.
Deep learning approaches for modeling chemical graphs have demonstrated viability for learning continuous vectorrepresentations of molecular data as well as for property prediction tasks (Duvenaud et al. et al. et al. et al.
PREPRINT - M AY
19, 2020protein-ligand binding complex. The PotentialNet architecture presented by Feinberg et al. et al. et al. nodes within this graph representation. Both covalent and non-covalent bondsare represented through the use of a square N × N adjacency matrix A and an N × M node feature matrix where A ∈ R N × N and A ij is equal to the euclidean distance (in angstroms ˚A) of atom i and atom j . To further expandthis representation as a 3D tensor, we define two thresholds for covalent and non-covalent “neighborness”, α c and α nc respectively s.t. A ij,c = 0 if A ij ≥ α c and A ij,nc = 0 if A ij ≥ α nc . In the context of this paper the thresholds usedwere α c = 1 . ˚A and α nc = 4 . ˚A. We found these settings lead to the best performance on the validation set. Weimplement the SG-CNN using the PyTorch Geometric (PyG) python library (Fey and Lenssen 2019). Fusion models to combine multiple input sources or different feature representations have been applied to a numberof computer vision applications, especially in the presence of multi-modal images or different image sensors. Thesefusion models benefit from multiple feature representations which are considered complementary to each other. In ad-dition, fusion-based approaches increase robustness by reducing uncertainty of each feature representation or modal-ity. Inspired by that, we propose to use a separate fusion neural network to combine feature representations fromtwo independently trained models (3D-CNN and SG-CNN), each of which has its own strength and weakness. Suchheterogeneous feature representations that the two models capture can enrich the proposed fusion model’s features,which has strong potential to improve the performance of binding affinity prediction.Among several ways to fuse models addressed in Roitberg et al.
Protein-ligand complex binding pockets from the PDBBind database were compared to identify local regions surround-ing ligands and perform structure-based clustering for evaluation of machine learning model performance. Clusteringof the structures was performed using LGA (Zemla 2003) on a whole-protein level as well as specific local substruc-tures selected to represent ligand binding site regions. The implemented approach can be briefly described as follows:for each protein-ligand complex, the binding site local environment was delineated using an initial 12.0 ˚A radius spherecentered at ligand atoms. The sphere size of 12.0 ˚A radius was selected in order to capture as much conformation infor-mation around the local environment as possible to allow detection of similarities between pockets even with differentsizes of observed ligands. Previous research indicated that distances of 7.5 ˚A are an upper limit in capturing informativefunctional properties for clustering purposes (Yoon et al.
PREPRINT - M AY
19, 2020Table 1: Performance of Binding affinity prediction on the crystal struc-ture of PDBBind 2016 core set.
Top : comparison of the proposed fusionapproaches with individual and existing models. R : refined set, G : generalset.Model r Pearson r Spearman r MAE RMSE
SG-CNN (R) .424 .666 .647 1.321 1.650SG-CNN (G) .519 .747 .746 1.194 1.508SG-CNN (R + G) .600 .782 .766 1.084 1.3753D-CNN (R) .523 .723 .716 1.164 1.5013D-CNN (G) .420 .649 .658 1.294 1.6553D-CNN (R + G) .397 .677 .657 1.334 1.688Late Fusion .628 .808 .803 1.044 1.326Mid-level Fusion .638 .810 .807 1.019 1.308
Pafnucy a - 0.78 - 1.13 1.42 a Stepniewska-Dziubinska et al.
Table 2: Comparison of the proposed mid-level fusion model withphysics-based scoring functions on the crystal structures of PDB-Bind 2016 core set. We give the results for the 243 complexes forwhich it was possible to compute a score across all methods. Thecorrelation coefficients of Vina and MM-GBSA scoring functionsare given as absolute values.Method
Pearson r Spearman r MAE RMSE
Vina .599 .605 - -MM/GBSA .647 .649 - -Mid-level Fusion .803 .797 1.035 1.327 et al.
The PDBBind 2016 core dataset was used to evaluate the following hypotheses: • The two CNN models provide complimentary information. • The fusion model learns to integrate the two CNN models and improve prediction over the individual ones. • The machine learning models retain prediction accuracy when presented with docked poses rather than crystalstructures. • The machine learning models are as accurate as the more computationally costly MM/GBSA re-scoringfunction.
Prediction Performance on PDBBind-2016 Crystal Structure
Table 2 summarizes the model performance on the crystal structure of PDBBind 2016 core set. Training on bothPDBBind’s general and refined data were considered. While training on the larger general dataset could improve per-formance, it has the drawback of noisier binding affinity measurements and lower resolution 3D structures (typicallylarger than 2.5 ˚A) (Su et al.
PREPRINT - M AY
19, 2020Table 3: Performance on PDBBind 2016 Core Set - Docking Poses.Model
Pearson r Spearman r MAE RMSE
SG-CNN .656 .625 1.343 .685 .668 1.34
Prediction Performance on PDBBind-2016 Docking Poses
Scoring the binding affinity of a crystal complex is useful for separating the scoring task from the ligand pose selectionproblem. In practice, however, the correct ligand pose will not be known and the scoring function will evaluate noisierand error prone docking poses. To address this problem, the machine learning models scored the top 10 Vina poses andreport the highest binding affinity for each complex for 257 test complexes where MM/GBSA re-scoring calculationscompleted. Waters captured in the crystal structure were removed since this information can artificially constrain thedocking poses and inflate performance. A modest increase in performance was observed when water is retained (resultsnot shown). Prediction performance on docking poses is summarized in Table 3 and compared with the original Vinadocking score and the more expensive MM/GBSA re-scoring function. As expected, overall performance decreasesrelative to scoring crystal structures, which can in part be explained by fewer correct poses to re-score. Using amaximum RMSD of 2 ˚A threshold between a docked pose and the crystal pose, a correct pose is found among thetop 10 Vina poses in only 77% of the cases when evaluating the refined dataset. Nonetheless, the fusion model’sPearson correlation coefficient, remains higher than the computationally costly MM/GBSA scores and Vina scores(0.685 versus 0.629 and 0.616), motivating use of the Fusion model over the scoring functions.Classification performance was evaluated for predicting non-binders (threshold set to pKi/pKd <
5) and binders(threshold set to pKi/pKd > The complexes found in the evaluation set came from 41 of the 830 structure based clusters. These clusters wereused to assess prediction performance across the different clusters. (The complete listing of clusters is provided as asupplemental file.) The MAE is shown in the Figure 3 and shows a trend of varying error exceeding 2 logs in somecases suggesting more accurate predictions for specific protein clusters.SMILES strings were constructed for 269 of the 290 compounds referenced in the 2016 core set. 51 compounds werefound to occur in both the holdout set and the refined training set. Tanimoto distance between each test ligand and itsmost similar ligand in the training set from the same cluster was compared with MAE but no correlation was found.These results suggest that models are learning the structurally important features, but other chemical and physicalinformation may be needed. There are six clusters (111,144,176, 20, 206 and 401) with at least two complexes withinthe respective cluster where a difference in error between the two models is at least 1 log and the SG-CNN doesconsistently better in each case. Similarly, there are four clusters (244,58,59 and 64) where the reverse occurs and the3D-CNN shows consistently lower error. Figure 4 shows the top 8 compounds with maximum prediction discrepancy7
PREPRINT - M AY
19, 2020Figure 3: MAE (x-axis) with standard deviation for groups (y-axis) based on the pocket and the ligand positioning.MAE is shown for the machine learning models. The number of complexes in the refined training set is shown foreach cluster ( gray bars). 8
PREPRINT - M AY
19, 2020Figure 4: Structure for 8 compounds with maximum difference in prediction between two models. Top 4 casesshown where error is lower for 3D-CNN or SG-CNN. Hydrogens are not shown, images are generated with Py-mol (Schr¨odinger, LLC 2015)in the two models. It is still unclear whether there are important structural differences in these clusters that explainthe advantage of one model over the other. The first two examples in Figure 4 highlight compounds that interact withthe same pocket type, 58 and 111 for the 3D-CNN and SG-CNN respectively. There are many other clusters whereneither model has a clear advantage. Nevertheless, the models clearly exhibit distinct performance profiles. While theFusion model exhibits better overall performance in more clusters than its constituent models, it is not able to give thelowest MAE in every cluster.
The results show that the two CNN models provide complimentary predictions for many test complexes. The currentSG-CNN implementation does not explicitly capture bond angles and we speculate that in some cases where the 3Dshape of the molecule is important, the 3D-CNN may have an advantage. On the other hand, the SG-CNN likelybenefits from a more explicit representation of pairwise interactions, which leads to fewer parameters to learn. Thebenefit of using both models is supported by the performance of the Fusion models, which yield improved overallperformance compared to the individual models. An area for future improvement could be in exploring activity mapssuch as those introduced in Hochuli et al.
PREPRINT - M AY
19, 2020The machine learning model prediction error appears to be surprisingly robust when predicting on new ligands inrecognized pockets. Moreover, accuracy should continue to improve as the amount of experimental data grows. Weconclude that the Fusion model will become a more computationally efficient alternative to the MM/GBSA re-scoringfunction.
Acknowledgements
None.
Funding
This work was supported by American Heart Association Cooperative Research and Development AgreementTC02274. This work was performed under the auspices of the U.S. Department of Energy by Lawrence LivermoreNational Laboratory under Contract DE-AC52-07NA27344. LLNL-JRNL-804162.
References
Ain, Q. U. et al. (2015). Machine-learning scoring functions to improve structure-based binding affinity predictionand virtual screening.
Wiley Interdiscip. Rev. Comput. Mol. Sci. , (6), 405–424.Ballester, P. J. and Mitchell, J. B. O. (2010). A machine learning approach to predicting protein-ligand binding affinitywith applications to molecular docking. Bioinformatics , (9), 1169–1175.Ch´eron, G. et al. (2015). P-CNN: Pose-Based CNN features for action recognition. In , pages 3218–3226.Deng, J. et al. (2009). ImageNet: A large-scale hierarchical image database. In , pages 248–255.Duvenaud, D. K. et al. (2015). Convolutional networks on graphs for learning molecular fingerprints. In C. Cortes,N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett, editors, Advances in Neural Information ProcessingSystems 28 , pages 2224–2232. Curran Associates, Inc.Feinberg, E. N. et al. (2018). PotentialNet for molecular property prediction.
ACS Cent Sci , (11), 1520–1530.Fey, M. and Lenssen, J. E. (2019). Fast graph representation learning with pytorch geometric.Gilmer, J. et al. (2017). Neural message passing for quantum chemistry. arXiv preprint arXiv:1704.01212v2,1-14 .Gomes, J. et al. (2017). Atomic convolutional networks for predicting Protein-Ligand binding affinity. arXiv preprintarXiv:1703.10603v1,1-17 .He, K. et al. (2016). Deep residual learning for image recognition. In , pages 770–778.Hochuli, J. et al. (2018). Visualizing convolutional neural network protein-ligand scoring. J. Mol. Graph. Model. , ,96–108.Huang, Y. et al. (2019). Human action recognition based on temporal pose CNN and multi-dimensional fusion. In Computer Vision – ECCV 2018 Workshops , pages 426–440. Springer International Publishing.Jakalian, A. et al. (2002). Fast, efficient generation of high-quality atomic charges. AM1-BCC model: II. parameteri-zation and validation.
J. Comput. Chem. , (16), 1623–1641.Jim´enez, J. et al. (2017). DeepSite: protein-binding site predictor using 3d-convolutional neural networks. Bioinfor-matics , (19), 3036–3042.Jim´enez, J. et al. (2018). KDEEP: Protein–Ligand absolute binding affinity prediction via 3D-Convolutional neuralnetworks. J. Chem. Inf. Model. , (2), 287–296.Kearnes, S. et al. (2016). Molecular graph convolutions: moving beyond fingerprints. J. Comput. Aided Mol. Des. , (8), 595–608.Keedy, D. A. et al. (2009). The other 90% of the protein: assessment beyond the calphas for CASP8 template-basedand high-accuracy models. Proteins ,
77 Suppl 9 , 29–49.Kuzminykh, D. et al. (2018). 3D molecular representations based on the wave transform for convolutional neuralnetworks.
Mol. Pharm. , (10), 4378–4385. 10 PREPRINT - M AY
19, 2020Li, Y. et al. (2016). Gated graph sequence neural networks. In Y. Bengio and Y. LeCun, editors, .Maier, J. A. et al. (2015). ff14SB: Improving the accuracy of protein side chain and backbone parameters from ff99SB.
J. Chem. Theory Comput. , (8), 3696–3713.O’Boyle, N. M. et al. (2011). Open babel: An open chemical toolbox. J. Cheminform. , , 33.Pettersen, E. F. et al. (2004). UCSF chimera–a visualization system for exploratory research and analysis. J. Comput.Chem. , (13), 1605–1612.Ragoza, M. et al. (2017). Protein–Ligand scoring with convolutional neural networks. J. Chem. Inf. Model. , (4),942–957.Roitberg, A. et al. (2019). Analysis of deep fusion strategies for multi-modal gesture recognition. In Proceedings ofthe IEEE Conference on Computer Vision and Pattern Recognition Workshops , pages 0–0.Schr¨odinger, LLC (2015). The PyMOL molecular graphics system, version 1.8.Stepniewska-Dziubinska, M. M. et al. (2018). Development and evaluation of a deep learning model for protein-ligandbinding affinity prediction.
Bioinformatics , (21), 3666–3674.Su, M. et al. (2019). Comparative assessment of scoring functions: The CASF-2016 update. J. Chem. Inf. Model. , (2), 895–913.Sun, H. et al. (2014). Assessing the performance of MM/PBSA and MM/GBSA methods. 5. improved dockingperformance using high solute dielectric constant MM/GBSA and MM/PBSA rescoring. Phys. Chem. Chem. Phys. , (40), 22035–22045.Trott, O. and Olson, A. J. (2010). AutoDock vina: improving the speed and accuracy of docking with a new scoringfunction, efficient optimization, and multithreading. J. Comput. Chem. , (2), 455–461.Wallach, I. et al. (2015). AtomNet: A deep convolutional neural network for bioactivity prediction in structure-baseddrug discovery. arXiv preprint arXiv:1510.02855v1,1-11 .wwPDB consortium (2019). Protein data bank: the single global archive for 3D macromolecular structure data. Nucleic Acids Res. , (D1), D520–D528.Yang, F. et al. (2018). A fusion model for road detection based on deep learning and fully connected CRF. In , pages 29–36.Yoon, S. et al. (2007). Clustering protein environments for function prediction: finding PROSITE motifs in 3D. BMCBioinformatics , , S10.Zemla, A. (2003). LGA: A method for finding 3D similarities in protein structures. Nucleic Acids Res. , (13),3370–3374.Zhang, H. et al. (2019). DeepBindRG: a deep learning based method for estimating effective protein-ligand affinity. PeerJ , , e7362.Zhang, X. et al. (2013). Message passing interface and multithreading hybrid for parallel molecular docking of largedatabases on petascale high performance computing machines. J. Comput. Chem. , (11), 915–927.Zhang, X. et al. (2014). Toward fully automated high performance computing drug discovery: a massively parallelvirtual screening pipeline for docking and molecular mechanics/generalized born surface area rescoring to improveenrichment. J. Chem. Inf. Model. , (1), 324–337. 11 mproved Protein-ligand Binding Affinity Prediction withStructure-Based Deep Fusion Inference: Supplemental Information Derek Jones [email protected]
Hyojin Kim [email protected]
Xiaohua Zhang [email protected]
Adam Zemla [email protected]
Garrett Stevenson [email protected]
William D. Bennett [email protected]
Dan Kirshner [email protected]
Sergio Wong [email protected]
Felice Lightstone [email protected]
Jonathan Allen [email protected]
May 19, 2020
Spatial Graph Convolutional Network Architecture
The Spatial Graph Convolutional Neural Network (SG-CNN) presented here is composed of a number of smallerneural networks and can be described in terms of a propagation layer, a gather or aggregation step, and finally,an output fully-connected network. The propagation layer can be understood as the portion of the network wheremessages are passed between the atoms and used to compute new feature values for each node, with a total of k i rounds of message passing, where i (wlog) corresponds to the i th propagation layer.Using this architecture, we define two distinct propagation layers for covalent and non-covalent interactions.In both cases, we use a single scalar value for the edge feature, the euclidean distance (measured in Angstroms, ˚A)between nodes v i and v j . Propagation is performed for k = 2 rounds for both cases, then the attention operationis performed on the final propagation output. For covalent propagation, the input node feature size is 20 andthe output size is 16 after the attention operation is applied. For non-covalent propagation, the node featuresthat are the result of the covalent attention operation are used as the initial feature set, the output node featuresize of this layer is 12. The resulting features of the non-covalent propagation are then “gathered” across theligand nodes in the graph to produce a “flattened” vector representation by taking a node-wise summation ofthe features. This graph summary feature is then fed through the output fully-connected network to produce thebinding affinity prediction ˆ y . ∀ i h k =0 i = x i for k = 0 (intialization of node features) h ki = GRU ( h k − i , X j ∈N ( e ) ( v i ) f e ( h j )) for k ∈ { ...K } (message passing)where x i is the feature vector of node v i . f e ( h i ) = Θ h i + X j ∈N ( i ) h j · f θ ( e i,j ) (outer edge network)where Θ is a learned set of parameters, x i is the set of features for n i , and f θ is a neural network that computesa new edge feature for the edge e i,j between v i and v j . 1 a r X i v : . [ q - b i o . B M ] M a y θ ( e i,j ) = φ ( f θ ( φ ( f θ ( e i,j )))) (inner edge network)where f θ and f θ are neural networks and φ is the Softsign (Turian et al. / | x | ). h K = σ ( p ( h K , h k =0 )) (cid:12) (cid:0) q ( b K )) (attention operation)where σ is the softmax activation function, q and p are neural networks. h gather = X v ∈ G ligand h v (gather step)where G ligand is the ligand subgraph of the binding complex graph.ˆ y = g ( h gather ) = g ( ReLU ( g ( ReLU ( g ( h gather ))))) (output network)2 xperiment Setup The 3D-CNN, SG-CNN, and the fusion network models, used an Adam optimizer with learning rate of 1 × − ,1 × − , and 1 × − , respectively. The mini-batch sizes are 50, 8, 100, and the approximate number ofepochs are 200, 200, 1000, respectively. We observed that smaller mini-batch sizes drastically improved the modelperformance in the training of the SG-CNN model while the effect of different mini-batch sizes was negligible in3D-CNN and the fusion model.The 3D-CNN and fusion models were developed using the Tensorflow python library (Abadi et al. et al. Visualizing Model Bias and MAE distribution
In figure 2 the PDBBind 2016 Core set is color codedinto 3 groups according to the MAE of the experimental log ( k i /k d ) (left panel) and the predicted log ( k i /k d )(right panel). Predictions that fall below one standard deviation of the mean MAE value for a given model (red),between 1-2 standard deviations (green), and exceed 2 standard deviations (blue). Figure 2 shows that modelspredict over the full range of binding affinity values, however, predictions are biased toward the mean. As theerror increases (higher values on the y-axis), the left panel plot shows how high error predictions have troublepredicting the tails of the distribution. The right panel plot shows that the models predict closer to the mean incases of high error. The black vertical line gives the location of the ground truth mean and the purple verticalline gives the predicted mean. 4igure 2: Scatter plots for each scoring method and the experiment binding affinity measurement. Relationship between scoring function output with experimental measurement.
Figure 2 gives scat-ter plots of the scores from Mid-level Fusion, Late Fusion, MM/GBSA, and Vina scoring methods versus theexperimental binding affinity for the 242 complexes of the 2016 core set for which a score across all methods waspossible to obtain. Both Fusion methods show a significant improvement over physics-based scoring methods (interms of Pearson correlation coefficient) with the experimental log ( k i /k d ).5igure 3: MAE (y-axis) with standard deviation for 57 functional groups (x-axis) defined by CASF-2016. MAEis shown for the machine learning models. Performance on protein targets (CASF-2016)
To consider prediction performance of the different models,mean absolute error (MAE) is shown for the 57 functional categories defined by the Comparative Assessment ofScoring Functions (CASF-2016) complexes, which consists of 285 PDBBind core complexes with 5 complexes percategory. The results are shown in Figure 3 with the categories sorted on the x-axis by increasing Mid-fusionMAE. The figure shows significantly different Mid-fusion MAE between groups, ranging from approximately 0.5to over 2 log units. The protein categories are limited by manual curation and were not reported for the completecollection of PDBBind complexes, making it difficult to assess overlap between binding pockets found in thetraining data and the test set. To address this, a binding pocket oriented structure based clustering scheme wasapplied to the complete collection of PDBBind complexes.6Table 1: Comparison of classifier performance on PDBBind 2016 Core Set - Docking Poses. SD=standarddeviation. Model
Bind ROC AUC SD No-bind ROC AUC SD
SG-CNN .784 .066 .829 .0633D-CNN .747 .081 .774 .063Vina .788 .071 .848 .052MM/GBSA .828 .064 .833 .057Late Fusion .82 .065 .859 .054Mid-level Fusion .806 .07 .853 .055
Classification of binders, performance comparison between fusion models and physics-based scoring
For the bind detection screening task, the results are summarized as ROC AUC and reflect randomly partitioningthe PDBBind 2016 core set into 5 non-overlapping folds, computing the ROC AUC on each fold, repeating thisprocedure 100 times and taking the average. The datasets for both tasks maintain a similar class imbalance of25% positives and 75% negatives. Table 1 show that MM/GBSA has slightly better performance than the Fusionmodel and both methods show a small improvement (0.04) over Vina. For the no-bind task, while the Fusionmodel had the highest ROC AUC the margin of difference was negligible compared to Vina (0.011).7igure 4: Histogram counting minimum Tanimoto distance between each compound in test and nearest match inPDBBind 20016 refined.
Structure similarity between refined abd core sets of PDBBind 2016
In order to gain an understandingof how “similar” our training set (in terms of the refined set) was to our testing set (the core set), we considerstructural similarity between the ligands in the refined and core sets as measure by the tanimoto distance metric.Figure 4 illustrates the distribution of the tanimoto distance between each ligand in the core set with its nearestneighbor in the refined set.
Disclaimer
This document was prepared as an account of work sponsored by an agency of the United States government. Neither the United Statesgovernment nor Lawrence Livermore National Security, LLC, nor any of their employees makes any warranty, expressed or implied, or assumesany legal liability or responsibility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disclosed,or represents that its use would not infringe privately owned rights. Reference herein to any specific commercial product, process, or serviceby trade name, trademark, manufacturer, or otherwise does not necessarily constitute or imply its endorsement, recommendation, or favoringby the United States government or Lawrence Livermore National Security, LLC. The views and opinions of authors expressed herein do notnecessarily state or reflect those of the United States government or Lawrence Livermore National Security, LLC, and shall not be used foradvertising or product endorsement purposes. eferences Abadi, M. et al. (2016). Tensorflow: A system for large-scale machine learning. In
Proceedings of the 12thUSENIX Conference on Operating Systems Design and Implementation , OSDI16, page 265283, USA. USENIXAssociation.Fey, M. and Lenssen, J. E. (2019). Fast graph representation learning with pytorch geometric.Paszke, A. et al. (2019). Pytorch: An imperative style, high-performance deep learning library. In
Advances inNeural Information Processing Systems 32 , pages 8024–8035. Curran Associates, Inc.Turian, J. et al. (2009). Quadratic features and deep architectures for chunking. In