[PDF] DeepSurf: A surface-based deep learning approach for the prediction of ligand binding sites on proteins

Abstract

The knowledge of potentially druggable binding sites on proteins is an important preliminary step towards the discovery of novel drugs. The computational prediction of such areas can be boosted by following the recent major advances in the deep learning field and by exploiting the increasing availability of proper data. In this paper, a novel computational method for the prediction of potential binding sites is proposed, called DeepSurf. DeepSurf combines a surface-based representation, where a number of 3D voxelized grids are placed on the protein's surface, with state-of-the-art deep learning architectures. After being trained on the large database of scPDB, DeepSurf demonstrates superior results on three diverse testing datasets, by surpassing all its main deep learning-based competitors, while attaining competitive performance to a set of traditional non-data-driven approaches.

Full PDF

DDeepSurf: A surface-based deep learning approach for theprediction of ligand binding sites on proteins

Stelios K. Mylonas , Apostolos Axenopoulos , and Petros Daras Information Technologies Institute, Centre for Research and Technology Hellas,Thessaloniki, Greece. * Correspondence : [email protected], [email protected]

Abstract

The knowledge of potentially druggable binding siteson proteins is an important preliminary step to-wards the discovery of novel drugs. The compu-tational prediction of such areas can be boostedby following the recent major advances in thedeep learning ﬁeld and by exploiting the increas-ing availability of proper data. In this paper, anovel computational method for the prediction ofpotential binding sites is proposed, called Deep-Surf. DeepSurf combines a surface-based represen-tation, where a number of 3D voxelized grids areplaced on the protein’s surface, with state-of-the-art deep learning architectures. After being trainedon the large database of scPDB, DeepSurf demon-strates superior performance on two diverse testingdatasets, by surpassing all its main deep learning-based competitors. The source code of the methodalong with trained models are freely available athttps://github.com/stemylonas/DeepSurf.git.

Structure-based drug discovery relies mostly onknowledge of potential binding sites of small com-pounds on protein structures. Computational bind-ing site prediction (BSP) allows to predict in silicoproperties that would require much eﬀort to estab-lish experimentally and can enhance signiﬁcantly the drug discovery process.Through the years, a plethora of methods havebeen proposed for the structure-based BSP task, and,according to [1], they can be roughly separated inthree categories: the geometry-based, the energy-based and the template-based ones. Geometry-basedmethods (ConCavity [2], Fpocket [3], CriticalFinder[4]) predict binding cavities based solely on the ge-ometry of the molecular surface, while energy-basedmethods (FTSite [5], AutoSite [6], [7]) calculate inter-action energies between protein atoms and chemicalprobes and attempt to locate energy minima on pro-tein’s surface. On the other hand, template-basedmethods (Findsite [8], LBias [9], LIBRA [10]) aimto extract binding sites on a protein by performingglobal or local structural alignment between this pro-tein and a set of preexisting templates. Furthermore,consensus algorithms have been proposed that com-bine the results from numerous standalone methods(metaPocket2.0 [11], COACH [12]).A new perspective on bioinformatics has been pro-vided by the machine learning (ML) ﬁeld. Machinelearning techniques exploit the available amount oflabeled data and, through the automated and itera-tive process of learning, manage to analyze and ex-tract the underlying patterns that eventually corre-late the data with their assigned label. Such method-ologies have also been recently introduced to thestructure-based BSP task [13],[14]. Speciﬁcally, Kri-vak and Hoksza proposed the P2Rank method [14],1 a r X i v : . [ q - b i o . B M ] F e b hich employs a random-forest (RF) classiﬁer to pre-dict ligandability score for points placed on the sol-vent accessible surface of a protein. A set of chem-ical and geometrical features are calculated on localspherical neighbourhoods around these points andoperate later as input to the RF classiﬁer. Thepoints receiving the highest ligandability scores arespatially clustered to ﬁnally provide the predictedbinding sites.Over the last few years, the increasing availabilityof large amount of data has led to the developmentof a subﬁeld in ML, namely the deep learning (DL)ﬁeld. DL has surpassed by far more traditional MLmethods in many scientiﬁc domains (computer vision,natural language processing, etc) and has been re-cently applied in a variety of structural bioinformat-ics tasks, such as virtual screening [15],[16], bindingaﬃnity prediction [17],[18] or protein structure pre-diction [19],[20]. DeepSite [21] was the ﬁrst attemptto employ a DL architecture in structure-based BSPtask, by using a rather shallow convolutional neuralnetwork (CNN) of 4 layers. DeepSite, like P2Rank,treats binding site prediction as a binary classiﬁca-tion problem, where ”binding” and ”not binding”are the two considered classes. Their main diﬀerenceis that DeepSite does not utilize any surface infor-mation but, on the contrary, operates on a 3D vox-elized grid of the protein. For each voxel of the grid,a feature vector is computed based on the physico-chemical properties of the neighboring protein atoms.Then, a sliding cuboid window of 16 × ×

16 tra-verses the entire grid creating subgrids of features,which are then imported to the CNN. Each subgridis ﬁnally assigned a ligandability score by the net-work. A very similar approach has been proposed in[22], where the main diﬀerence is the set of featuresemployed. Two recently proposed methods, calledKalasanty [23] and FRSite [24], resemble to DeepSitein protein representation, since they also employ a3D voxelization of the entire protein, but they diﬀeron how they approach the BSP task. BSP is treatedas an object-detection problem by FRSite and as asemantic segmentation problem by Kalasanty, wherein both cases the desired object to be extracted is thecorresponding binding site. In FRSite, a 3D versionof Faster-RCNN is employed, while in Kalasanty a common segmentation architecture, called U-Net, isadapted to the needs of the speciﬁc task. Accordingto the reported results in Kalasanty, this alternaterepresentation has achieved higher accuracies thanDeepSite.Among the aforementioned methodologies, bothDeepSite and Kalasanty exploit the inherent capa-bilities of deep learning architectures to learn fromlarge databases and automatically extract features.However, the voxelized representation of the entireprotein they adopt may have several limitations: Itneglects any knowledge of the surface morphology,while this ﬁxed structural discretization of the in-put space can lead to information loss. On the otherhand, P2Rank tries to exploit the binding mechanicsin a more eﬃcient way, by employing a surface-basedrepresentation of the protein, which resembles moreto the actual binding process. As stated in [14], fo-cusing on grid points or atoms has led experimentallyto signiﬁcantly worse results than focusing on sur-face points. Nevertheless, the classiﬁcation approachemployed in P2Rank (using RF classiﬁer) has lim-ited generalisation capabilities comparing with ap-proaches based on Deep Learning, which can beneﬁtfrom much larger training datasets.Inspired by the aforementioned approaches, ourproposed DeepSurf method combines eﬀectively thelearning capabilities of advanced CNN architectureswith the surface-based representation of the 3D pro-tein structure. More speciﬁcally, DeepSurf employsa 3D-CNN architecture on localized 3D grids, whichare appropriately oriented and placed on a set of se-lected surface points, as detailed in Section 2.1. Thisapproach of localized grids resembles to P2Rank, inthe sense that both methodologies consider a localneighborhood around each surface point. Their ba-sic diﬀerences are that, instead of calculating hand-crafted features, we employ a 3D-CNN for featureextraction, and that we consider a voxelized cuboidneighborhood, instead of a spherical one.The main contributions of our approach are i) anew representation of the 3D protein surface is in-troduced, based on local voxel grids centred at sam-ple points of the surface; ii) a novel residual networkLDS-ResNet that has shown better performance thanthe baseline ResNet in image analysis tasks has been2 lgorithm 1

DeepSurf

Input:

Protein structure Create the solvent accessible surface of the pro-tein Reduce the set of surface points for each point P do Compute normal vector n on P Create local grid on P , aligned according to n Calculate grid features Import grid to 3D-CNN and get ligandabilityscore for P end for Discard points with score less than T Cluster the remaining points for each cluster do Assign each cluster point to its closest proteinatom

Form a binding site from these atoms end for

Rank binding sites by average ligandability score

Output:

Binding sitesextended in three dimensions to be applicable to vol-umetric data. The proposed method has been evalu-ated in binding site prediction using diﬀerent bench-mark datasets, demonstrating superior performanceamong state-of-the-art approaches.

A short outline of our method is given in Algorithm 1.Firstly, the solvent accessible surface (SAS) of theprotein is created in a triangular mesh format. Theresulting mesh is usually too dense, with unnecessaryredundancy of points, which can lead our algorithmto a severe computational burden. For this reason,we apply a subsequent ”mesh simpliﬁcation” step,where the total number of surface points is reducedby a factor of f (e.g. f = 10). For this task, weemploy the K-means clustering algorithm aiming toaggregate adjacent points into one cluster. If n p isthe number of original surface points, the total num- ber of created clusters is equal to n p /f . As we cansee, parameter f controls the density of the points tobe preserved and corresponds to the average numberof points per cluster. Finally, from each cluster theclosest point to the cluster center is kept.One issue related to the voxelized representa-tion of a protein is the lack of rotation invariance.Speciﬁcally, due to lack of symmetry, the employed3D cuboid grids are always rotation-sensitive andstrongly depend on the arbitrary placement of theaxes. Most methods attempted to address this issueby augmenting the data with random rotations dur-ing training [15],[23]. On the other hand, P2Rank, asa non-voxelized method, bypassed this issue by uti-lizing symmetric spherical neighborhoods. We aimto alleviate this problem by aligning the local gridswith the orientation of the normal vectors of the cor-responding surface points. This alignment approachwas inspired by a previous work [25], where localspherical regions on a protein surface were alignedaccording to the orientation of the normal vectors, inorder to extract local shape descriptors. An illustra-tion of this step is shown in Fig. 1. A local grid of size16 × ×

16 and resolution 1 ˚A is centered on surfacepoint P and is oriented such that the z-axis is alwaysparallel to the normal vector n on P , i.e. perpendic-ular to the surface. With this approach, the rotationissue is not eliminated, since random rotations arestill applied during training. However, this selectiveinitial placement of axes, instead of a random one,resulted to a more eﬀective training and evaluationscheme.After the proper localization and orientation of thegrid, the next step is to calculate the necessary fea-tures that will form the 4D tensor which is then im-ported to the 3D-CNN. We adopt here the featur-ization scheme initially introduced by [17] and usedalso in Kalasanty [23], which consists of 18 chemi-cal features calculated per protein atom. Each gridvoxel receives the features of the atoms inside it. Theformed 4D tensor is then imported to CNN and pro-duces at the output a ligandability score for the spe-ciﬁc surface point. Although our approach has beentested using speciﬁc deep neural network architec-tures, the proposed methodology is generic, meaningthat any 3D-CNN architecture that receives as input3igure 1: Illustration of 3D grid localized on surfacepoint P and aligned according to normal n .a 4D tensor and returns as output a ﬂoat value inrange [0,1] can be used instead. The exact networkarchitectures employed in our experiments are elabo-rated in the next subsection. After obtaining ligand-ability scores for all surface points, we need to extractdistinct binding sites. Points with score less than T are considered not reliable and are discarded, whilethe remaining ones are clustered using the mean-shiftalgorithm [26]. The main reason for selecting mean-shift instead of other clustering algorithms, is thatwith mean-shift we do not need to declare the num-ber of clusters in advance. This property matches ex-actly to our case, since the exact number of bindingsites is not known beforehand. Finally, the surfacepoints from each cluster are assigned to their nearestprotein atoms and form the desired binding sites. As previously noted, the 3D-CNN in DeepSurf can besubstituted by any 3D convolutional network of user’schoice. In this work we adopted the ResNet [27] ar-chitecture, which belongs to the family of residualnetworks. The main attribute of ResNet is the ex-istence of skip connections between adjacent layers,so as to avoid the vanishing gradient problem. Thebaseline residual block of 3D-ResNet is depicted inFig. 2(a). ResNet is formed by stacking a number ofthese blocks. We employed here a 18-layer ResNet,with the exact structure being shown in the originalwork [27]. Considering the fact that we are employ-ing 3D convolutions, the number of parameters in3D-ResNet can be dramatically increased comparedto 2D-ResNet. In the same work [27], the bottle-neck architecture had also been presented, which al-lows more eﬀective training of deeper ResNets withconsiderably less parameters per block. Recently,a novel residual network has been proposed, calledLDS-ResNet [28], that has shown better performancethan the baseline ResNet in computer vision tasks.Notably, LDS-ResNet acquired its best results whencombined with a bottleneck architecture, which sig-niﬁcantly surpassed all the non-bottleneck variants.In this work, we implemented a 3D variant of the bot-tleneck LDS-ResNet, with its main block depicted inFig. 2(b). The diﬀerence to Fig. 2(a) is the additionof a second branch with an LDS module parallel tothe original convolutional branch and the subsequentconcatenation of these two branches. In the follow-ing subsection, the extension of the LDS module inthree dimensions, which is proposed in this paper, isillustrated.

LDS-ResNets were inspired by the Linear DynamicalSystems theory where a dynamical system is modeledthrough two time-evolving stochastic processes. Theﬁrst process estimates a hidden state vector h t andthe second one provides the observed output y t as afunction of this hidden state. A similar approach wasadopted in the LDS-module proposed by [28], withthe exclusion of the time-evolution factor. The herein4igure 2: Baseline blocks for (a) original 3D-ResNet,(b) bottleneck 3D-LDS-ResNetproposed 3D variant of this module is illustrated inFig. 3. Let us assume the input X to the module isa 4D tensor of size h × w × d × d in , where h , w , d are the spatial dimensions and d in is the number ofchannels. For clariﬁcation reasons, the operation ispresented in Fig. 3 just for one channel ( d in = 1).The LDS module operates iteratively over X on 4Dpatches X t ∈ R n × n × n × d in (in our experiments weused n = 3). The calculation of the LDS modulesoutput Y t involves two main steps.The ﬁrst one simulates the hidden state calculationof the LDS theory. Each patch X t is unfolded to a2D matrix x t ∈ R n × d in and the hidden state h t ∈ R n × d in is obtained by: h t = Ax t (1)where A ∈ R n × n is the hidden state transition ma-trix. Its values are randomly initialized for each layer and subsequently optimized during training throughbackpropagation. Then, h t is folded back to H t ∈ R n × n × n × d in and for every t these subvolumes arestored successively without overlapping, resulting inthe intermediate volume of H ∈ R nh × nw × nd × d in .The second step of the module performs the map-ping from the hidden state h t to output y t , as inoriginal LDS theory. Speciﬁcally, y t = f ( W, h t ) (2)where f () is a non-linear function with learnable pa-rameters W , and is implemented here by a convolu-tional operation. Volume H is convolved with a setof d out ﬁlters W ∈ R n × n × n × d in with a stride k · n ineach spatial dimension in order to align the ﬁlters W with the regions corresponding to each of the patches X t . Factor k controls the downsampling rate of thespeciﬁc building block. When k = 1, the output Y ofthe LDS module is a h × w × d × d out tensor, while k > k . The demonstrated eﬃciency of deep neural networkson many research ﬁelds lies greatly on the exploita-tion of large amount of qualitative and properly la-beled data that can be used for training. The largestand most suitable database currently available for theBSP task is the scPDB database [29], a continuouslyupdated collection of ligandable binding sites of theProtein Data Bank. These binding sites are deﬁnedfrom complexes between a protein and a pharmaco-logical ligand. One asset of scPDB is that, beyondthe atom-based description of the protein and its lig-and, it provides also their binding site, being thussuitable for a robust comparison and assessment ofthe examined methods. We utilized the 2017 releaseof the database which comprises 16034 entries corre-sponding to 4782 proteins with 17594 total bindingsamples. After removing some entries due to failurein reading or in feature extraction, the ﬁnal datasetcontains 15182 structures. For training and valida-tion purposes, the remaining structures were split to5igure 3: Graphical illustration of LDS-module’s operation in three dimensions. For presentation reasons,only one input and output channel is considered ( d in = d out = 1).5 folds according to their Uniprot IDs, so as struc-tures from the same protein should be included inthe same fold. This separation ensures that the sameprotein pockets does not coexist in the training andtesting set of a split, allowing a more robust and fairassessment.For testing purposes, two diﬀerent datasets wereused, namely the COACH420 and HOLO4K. Thesedatasets were employed in the evaluation of theP2Rank method [14] and the corresponding authorsprovided them freely. COACH420 has been derivedfrom the COACH test set [12] and consists of 420single chain structures containing a mix of drug tar-gets and naturally occurring ligands. HOLO4K isa larger dataset (4009 structures) containing largermulti-chain structures and was initially utilized by[30]. The solvent accessible surface (SAS) of proteins is cal-culated by the DMS software. Besides the molecularsurface, DMS also returns the normal vectors at eachsurface point. Despite setting DMS to create surfaceswith the lowest possible density, the returned set ofpoints is still quite dense, with the average minimumdistance between neighboring points being 0.7 ˚A. Pa-rameter f , that controls the subsequent simpliﬁcationprocess, should be set in a way that achieves a com-promise between losing valuable surface information and avoiding excessive computational cost. In ourcase, we chose a value of f = 10, which raised theaverage minimum distance of the remaining surfacepoints to 2.3 ˚A.Prior to importing in DeepSurf, proteins shouldalso be properly pre-processed. Speciﬁcally, water,ions and ligands are removed from the PDB struc-tures, and the remaining structure is protonated, ifneeded. Before the ﬁnal step of binding sites extrac-tion (step 12 in Algorithm 1), hydrogen atoms areremoved from the protein in order binding sites tomaintain only heavy atoms.As previously stated, BSP is treated here as a bi-nary classiﬁcation problem, where the two consideredclasses are the ”binding” and ”non binding” ones.Therefore, the samples used for training of the 3D-CNN should belong to one of these classes. For eachprotein of scPDB, surface points that are within 4 ˚Adistance from any ligand atom are considered as bind-ing samples, while all the rest ones are non-bindingsamples. In this case, the resulting dataset would bequite imbalanced, since the non-binding samples out-number by far the binding ones.The class imbalanceproblem is a well-known problem in machine learn-ing applications and a number of tactics have beenproposed to tackle with it [31]. The most commontactic lies on the data level and consists of eitherundersampling the main class or oversampling thesecondary one. Due to the required time eﬃciencyduring training, the former technique was herein fol-6owed. For each protein, from the set of non-bindingsamples a number equal to the binding samples wasrandomly chosen in order to obtain a 50/50 balancebetween the two classes.DeepSurf was implemented in Python and theTensorﬂow framework was employed for the deeplearning operations. As already shown in Fig. 3,the LDS module consists of two layers: a customlayer and a 3-D convolutional layer. The customlayer consists of the transition matrix A calcula-tion and the patch level multiplication (1). Sincethis is a patch-based iterative operation, like con-volution, it can become extremely computationallyheavy for larger input sizes. For this reason, thecustom layer was implemented in CUDA to enablehigh level of parallelization and was, afterwards, in-tegrated in Tensorﬂow. The source code of themethod along with trained models are available athttps://github.com/stemylonas/DeepSurf.git.Regarding the training process, L2 regularizationwas applied on the weights of all convolutional layers( λ = 10 − ), while batch normalization was appliedwith its default parameters. All models were trainedfor 20 epochs, with batch size of 64 samples, and wereoptimized by the Adam optimizer [32] with a learningrate of 10 − . The evaluation criteria used to assess the perfor-mance of the proposed method are the following: • DCC:

Success rate (%) when considering thedistance between the predicted pocket centerand the real binding site center. Distances lessthan 4 ˚A are successful. • DCA:

Success rate (%) when considering thedistance between the predicted pocket centerand the closest ligand atom. Distances less than4 ˚A are successful. • OVR:

Overlapping criterion on the atom leveldeﬁned as the intersection of the real and pre-dicted binding sites divided by their union. Table 1: Contribution evaluation of the core partsof DeepSurf. Results depicted are average cross-validation performances on scPDB dataset using theDCC criterion.

Top-n Top-(n+2)

DeepSite’s network 62.1 64ResNet-18 (w/o align) 66.8 69.1ResNet-18 (w align) 68.1 70.4The DCC and DCA metrics have been widely usedin previous works to evaluate the localization qualityof extracted binding sites by measuring their distancefrom either the annotated binding site or the corre-sponding ligand. On the other hand, OVR diﬀersfrom the above distance-based metrics by consideringalso the shape of the binding sites, since it expressesa normalized spatial overlap between the predictedand the actual location of the binding pocket. Inthe following experiments, the DCC metric is usedfor evaluating the performance on scPDB, while theDCA and OVR metrics are employed for the compar-ative assessment on COACH420 and HOLO4K. In allcases, the top-n and top-(n+2) predicted pockets areconsidered, where n is the number of ligands for thespeciﬁc protein. Finally, the ligandability threshold T is set to 0.9 in all experiments. The sensitivity ofDeepSurf on selection of T is examined more thor-oughly in Section 5.3. The ﬁrst stage of our experiments consists of the 5-fold cross validation (CV) on scPDB. The goal of thisexperiment is twofold. Firstly, we would like to eval-uate independently some fundamental steps of ourmethod and, secondly, to test the behavior of Deep-Surf with residual architectures of diﬀerent size andtype. As described in Section 3, the scPDB datasetwas split to ﬁve folds and for each fold a diﬀerentmodel was trained. Results depicted in both Tables1 and 2 are average performances on these folds.Initially, in order to estimate the separate con-tribution of some core parts of DeepSurf, we con-ducted two additional experiments. In the ﬁrst one,7able 2: Evaluation of DeepSurf with lightweight residual architectures. Results depicted are average cross-validation performances on scPDB dataset using the DCC criterion.

After evaluating the individual features of ourmethod through cross-validation, we perform com-parison of DeepSurf to other competing in the BSPtask deep learning methods that are publicly avail-able. Speciﬁcally, we perform comparison to Deep-Site [21], Jiang’s method [22] and Kalasanty [23].From the various architectures of DeepSurf testedin Section 5.1, we keep for comparison the base-line ResNet-18 and the lightweight bottleneck LDS-ResNet-18, which provided the highest accuracies.8able 3: Performance comparison of DeepSurf with competing deep learning methods using the DCA crite-rion. COACH420 HOLO4KTop-n Top-(n+2) Top-n Top-(n+2)DeepSite [21] 57.5 65.1 45.6 48.2Jiang et al. [22] 55 58.7 38.2 41.5Kalasanty [23] 68 70.4 32.1 32.3DeepSurf (ResNet-18) 71.9 72.3 50.7 51.1DeepSurf (Bot-LDS-ResNet-18) 71.3 72.9 50.4 50.9Table 4: Performance comparison of DeepSurf andKalasanty using the OVR criterion, computed onlyfor correctly located binding sites (

DCA <

COACH420 HOLO4K

Kalasanty 0.21 0.15DeepSurf (

ResNet-18 ) 0.29 0.17DeepSurf (

Bot-LDS-Res-18 ) 0.28 0.16For testing purposes, two diﬀerent datasets were uti-lized, namely the COACH420 and HOLO4K datasets(for more details see Section 3). In order to avoiddata leakage, all proteins from COACH420 andHOLO4k were removed from scPDB, and the remain-ing dataset was used to train the two variants ofDeepSurf. Although all of the competing methodshave been trained on the same database (scPDB),any proteins common to our testing datasets have notbeen removed. This means that these methods havea slight advantage due to this speciﬁc data leakage.Provided DeepSite results are those obtained by [14].In case that a method fails to produce any bindingsite, an adequately large value of DCA is assigned foreach ligand of this protein ensuring that this solutionwill be regarded erroneous.The obtained DCA performances are shown in Ta-ble 3. Regarding the three competing methods, wenotice that Kalasanty surpasses clearly the others inCOACH420, while DeepSite is by far superior in themost challenging dataset of HOLO4K. Nevertheless,DeepSurf clearly outperforms all competing methodsin both datasets. Speciﬁcally, DeepSurf is superior to Kalasanty in COACH420 by 3.7% in top-n accuracyand 2% in top-(n+2), while in HOLO4K, DeepSurfoutperforms DeepSite by 5% in top-n accuracy and3% in top-(n+2). Among the two DeepSurf alterna-tives, ResNet-18 achieves less than 0.5% higher top-naccuracies in both datasets, while bottleneck LDS-ResNet-18 prevails in terms of top-(n+2) accuracy inCOACH420. This indicates the computational andgeneralization eﬀectiveness of the LDS-equipped net-work when applied to unknown structures, since itachieves similar results to ResNet-18 but with thebeneﬁt of more than 10 times fewer parameters.For a more comprehensive comparison of the abovemethods, an overlapping criterion should also be ap-plied that evaluates the shape of the extracted pock-ets. According to [33] and [34], binding sites are de-ﬁned as the non-hydrogen atoms of a residue that arewithin 4 ˚A to a non-hydrogen atom of the ligand. Fol-lowing this principle, we extracted binding sites for allproteins in COACH420 and HOLO4K and computedthe OVR values only for the correctly located bind-ing pockets (

DCA < et al.

12 65 1.4 3.4Kalasanty 16 475 1.1 1.2DeepSurf (ResNet-18) 10 9 1.1 1.8DeepSurf (Bot-LDS-ResNet-18) 11 24 1.1 1.7methods is given in Table 5, which provides the av-erage number of predicted pockets along with thenumber of proteins where each method failed to pro-duce a result. As we can see, Kalasanty was unableto extract binding sites for a large number of pro-teins, even after adjusting its default parameters. Forexample, in the case of HOLO4K, no binding sitereturned for 475 out of 4009 proteins. Among thetwo DeepSurf variants, ResNet-18 appeared more ro-bust, since it encountered the fewer failures in case ofHOLO4K. Regarding the number of extracted bind-ing sites, DeepSurf and Kalasanty have the tendencyto return fewer pockets than DeepSite and Jiang’smethod in both datasets. In the case of COACH420,it is beneﬁcial since both methods extract a numberof pockets similar to the average number of true ones(1.2). On the other hand, in HOLO4K, DeepSurf,and especially Kalasanty, return on average fewerbinding sites than the actual ones (2.8). This canexplain the larger diﬀerences between top-n and top-(n+2) accuracies observed in the case of DeepSite andJiang’s method compared to the rest of the methods(see Table 3).

A key parameter of DeepSurf is the ligandabilitythreshold T above which surface points are consid-ered potential binders. In the above experiments,a high ligandability threshold of T = 0 . T = 0 . T = 0 .

9, respectively. Al-though, in both cases, the extracted pockets are con-sidered successful due to low DCA values, we can ob-serve that the extracted pocket in Fig. 4(a) is largerand expands to undesired areas (marked with blackcircles) away from the ligand. This is totally ex-pected/reasonable, since a smaller value of T leads tothe preservation of more surface points before clus-tering and, subsequently, to the formation of largerbinding sites. Furthermore, the inﬂuence of varying T in the obtained results is examined quantitativelyin Table 6, where the DCA values for both DeepSurfvariants and for various ligandability thresholds arepresented. As we can see, lower values of T lead con-sistently to smaller performances in both datasets,especially when T = 0 .

5. Among the two DeepSurfvariants, the lighter one seems to be more aﬀected byvariations of T (2% drop in COACH420 and 3% inHOLO4K). From the aforementioned, it is concludedthat DeepSurf exhibits its optimal performance whena high ligandability threshold is set. In this paper, a novel method, called DeepSurf, waspresented for predicting potential druggable sites onproteins. The identiﬁcation of promising candidate10able 6: Performance of DeepSurf using the DCA criterion for diﬀerent values of T .COACH420 HOLO4KThreshold T Funding

The work has been supported by the ATXN1-MED15PPI project funded by the GSRT - Hellenic Founda-tion for Research and Innovation.

References [1] Gabriele Macari, Daniele Toti, and Fabio Polti-celli. Computational methods and tools for bind-ing site recognition between proteins and smallmolecules: from classical geometrical approachesto modern machine learning strategies.

Journal

Figure 4: Binding site extraction example for struc-ture ’1lqdB’ with ligandability threshold (a) T = 0 . T = 0 .

9. Black circles point the areas wherethe two results diﬀer. of Computer-Aided Molecular Design , 33(10):887–903, 2019.[2] John A Capra, Roman A Laskowski, Janet MThornton, Mona Singh, and Thomas AFunkhouser. Predicting protein ligand bindingsites by combining evolutionary sequence con-servation and 3d structure.

PLoS computationalbiology , 5(12):e1000585, 2009.[3] Vincent Le Guilloux, Peter Schmidtke, andPierre Tuﬀery. Fpocket: an open source plat-form for ligand pocket detection.

BMC bioinfor-matics , 10(1):168, 2009.[4] S´ergio ED Dias, Quoc T Nguyen, Joaquim AJorge, and Abel JP Gomes. Multi-gpu-based de-tection of protein cavities using critical points.

Future Generation Computer Systems , 67:430–440, 2017.[5] Chi-Ho Ngan, David R Hall, Brandon Zerbe,11aurie E Grove, Dima Kozakov, and SandorVajda. Ftsite: high accuracy detection of lig-and binding sites on unbound protein structures.

Bioinformatics , 28(2):286–287, 2011.[6] Pradeep Anand Ravindranath and Michel FSanner. Autosite: an automated approach forpseudo-ligands predictionfrom ligand-bindingsites identiﬁcation to predicting key ligandatoms.

Bioinformatics , 32(20):3142–3149, 2016.[7] Hiroto Tsujikawa, Kenta Sato, Cao Wei, GulSaad, Kazuya Sumikoshi, Shugo Nakamura,Tohru Terada, and Kentaro Shimizu. Devel-opment of a protein–ligand-binding site predic-tion method based on interaction energy and se-quence conservation.

Journal of structural andfunctional genomics , 17(2-3):39–49, 2016.[8] Michal Brylinski and Jeﬀrey Skolnick. Athreading-based method (ﬁndsite) for ligand-binding site prediction and functional annota-tion.

Proceedings of the National Academy ofsciences , 105(1):129–134, 2008.[9] Howook Hwang, Fabian Dey, Donald Petrey,and Barry Honig. Structure-based predictionof ligand–protein interactions on a genome-widescale.

Proceedings of the National Academy ofSciences , 114(52):13685–13690, 2017.[10] Daniele Toti, Le Viet Hung, Valentina Tortosa,Valentina Brandi, and Fabio Polticelli. Libra-wa:a web application for ligand binding site detec-tion and protein function recognition.

Bioinfor-matics , 34(5):878–880, 2017.[11] Zengming Zhang, Yu Li, Biaoyang Lin, MichaelSchroeder, and Bingding Huang. Identiﬁcationof cavities on protein surface using multiple com-putational approaches for drug binding site pre-diction.

Bioinformatics , 27(15):2083–2088, 2011.[12] Jianyi Yang, Ambrish Roy, and Yang Zhang.Protein–ligand binding site recognition us-ing complementary binding-speciﬁc substruc-ture comparison and sequence proﬁle alignment.

Bioinformatics , 29(20):2588–2595, 2013. [13] Jhih-Wei Jian, Pavadai Elumalai, ThejkiranPitti, Chih Yuan Wu, Keng-Chang Tsai, Jeng-Yih Chang, Hung-Pin Peng, and An-Suei Yang.Predicting ligand binding sites on protein sur-faces by 3-dimensional probability density dis-tributions of interacting atoms.

PloS one , 11(8):e0160315, 2016.[14] Radoslav Kriv´ak and David Hoksza. P2rank:machine learning based tool for rapid and accu-rate prediction of ligand binding sites from pro-tein structure.

Journal of cheminformatics , 10(1):39, 2018.[15] Matthew Ragoza, Joshua Hochuli, Elisa Idrobo,Jocelyn Sunseri, and David Ryan Koes. Protein–ligand scoring with convolutional neural net-works.

Journal of chemical information andmodeling , 57(4):942–957, 2017.[16] Fergus Imrie, Anthony R Bradley, Mihaelavan der Schaar, and Charlotte M Deane. Pro-tein family-speciﬁc models using deep neuralnetworks and transfer learning improve virtualscreening and highlight the need for more data.

Journal of chemical information and modeling ,58(11):2319–2330, 2018.[17] Marta M Stepniewska-Dziubinska, Piotr Zie-lenkiewicz, and Pawel Siedlecki. Develop-ment and evaluation of a deep learning modelfor protein–ligand binding aﬃnity prediction.

Bioinformatics , 34(21):3666–3674, 2018.[18] Jos´e Jim´enez, Miha Skalic, Gerard Martinez-Rosell, and Gianni De Fabritiis. K deep:protein–ligand absolute binding aﬃnity predic-tion via 3d-convolutional neural networks.

Jour-nal of chemical information and modeling , 58(2):287–296, 2018.[19] Sheng Wang, Jian Peng, Jianzhu Ma, and JinboXu. Protein secondary structure prediction us-ing deep convolutional neural ﬁelds.

Scientiﬁcreports , 6:18962, 2016.[20] Andrew W Senior, Richard Evans, John Jumper,James Kirkpatrick, Laurent Sifre, Tim Green,12hongli Qin, Augustin ˇZ´ıdek, Alexander WRNelson, Alex Bridgland, et al. Protein struc-ture prediction using multiple deep neural net-works in the 13th critical assessment of pro-tein structure prediction (casp13).

Proteins:Structure, Function, and Bioinformatics , 87(12):1141–1148, 2019.[21] Jos´e Jim´enez, Stefan Doerr, Gerard Mart´ınez-Rosell, Alexander S Rose, and Gianni De Fab-ritiis. Deepsite: protein-binding site predictorusing 3d-convolutional neural networks.

Bioin-formatics , 33(19):3036–3042, 2017.[22] Mingjian Jiang, Zhen Li, Yujie Bian, andZhiqiang Wei. A novel protein descriptor for theprediction of drug binding sites.

BMC bioinfor-matics , 20(1):1–13, 2019.[23] Marta M Stepniewska-Dziubinska, Piotr Zie-lenkiewicz, and Pawel Siedlecki. Detection ofprotein-ligand binding sites with 3d segmenta-tion. arXiv preprint arXiv:1904.06517 , 2019.[24] Mingjian Jiang, Zhiqiang Wei, Shugang Zhang,Shuang Wang, Xiaofeng Wang, and Zhen Li. Fr-site: Protein drug binding site prediction basedon faster r–cnn.

Journal of Molecular Graphicsand Modelling , 93:107454, 2019.[25] Apostolos Axenopoulos, Dimitrios Rafailidis,Georgios Papadopoulos, Elias N Houstis, andPetros Daras. Similarity search of ﬂexible 3dmolecules combining local and global shape de-scriptors.

IEEE/ACM transactions on computa-tional biology and bioinformatics , 13(5):954–970,2015.[26] Dorin Comaniciu and Peter Meer. Mean shift:A robust approach toward feature space analy-sis.

IEEE Transactions on Pattern Analysis &Machine Intelligence , 24(5):603–619, 2002.[27] Kaiming He, Xiangyu Zhang, Shaoqing Ren, andJian Sun. Deep residual learning for image recog-nition. In

Proceedings of the IEEE conference oncomputer vision and pattern recognition , pages770–778, 2016. [28] Anastasios Dimou, Dimitrios Ataloglou, Kos-mas Dimitropoulos, Federico Alvarez, and Pet-ros Daras. Lds-inspired residual networks.

IEEETransactions on Circuits and Systems for VideoTechnology , 2018.[29] J´er´emy Desaphy, Guillaume Bret, Didier Rog-nan, and Esther Kellenberger. sc-pdb: a 3d-database of ligandable binding sites10 yearson.

Nucleic acids research , 43(D1):D399–D404,2014.[30] Peter Schmidtke, Catherine Souaille, Fr´ed´ericEstienne, Nicolas Baurin, and Romano T Kroe-mer. Large-scale comparison of four binding sitedetection algorithms.

Journal of chemical infor-mation and modeling , 50(12):2191–2200, 2010.[31] Justin M Johnson and Taghi M Khoshgoftaar.Survey on deep learning with class imbalance.

Journal of Big Data , 6(1):27, 2019.[32] Diederik P Kingma and Jimmy Ba. Adam:A method for stochastic optimization. arXivpreprint arXiv:1412.6980 , 2014.[33] Nicholas M Luscombe, Roman A Laskowski, andJanet M Thornton. Amino acid–base interac-tions: a three-dimensional analysis of protein–dna interactions at an atomic level.

Nucleic acidsresearch , 29(13):2860–2874, 2001.[34] Ke Chen, Marcin J Mizianty, Jianzhao Gao, andLukasz Kurgan. A critical comparative assess-ment of predictions of protein-binding sites forbiologically relevant organic compounds.