[PDF] Coupling enhanced sampling of the apo-receptor with template-based ligand conformers selection: Performance in pose prediction in the D3R-GC4

Abstract

We report the performance of our newly introduced Ensemble Docking with Enhanced sampling of pocket Shape (EDES) protocol coupled to a template-based algorithm to generate near-native ligand conformations in the 2019 iteration of the Grand Challenge organized by the D3R consortium. Using either AutoDock4.2 or HADDOCK2.2 docking programs (each software in two variants of the protocol) our method generated native-like poses among the top 5 submitted for evaluation for most of the 20 targets with similar performances. The protein selected for GC4 was the human beta-site amyloid precursor protein cleaving enzyme 1 (BACE-1), a transmembrane aspartic-acid protease. We identified at least one pose whose heavy-atoms RMSD was less than 2.5 Å from the native conformation for 16 (80%) and 17 (85%) of the twenty targets using AutoDock and HADDOCK, respectively. Dissecting the possible sources of errors revealed that: i) our EDES protocol (with minor modifications) was able to sample sub-ångstrom conformations for all 20 protein targets, reproducing the correct conformation of the binding site within ~1 Å RMSD; ii) as already shown by some of us in GC3, even in the presence of near-native protein structures, a proper selection of ligand conformers is crucial for the success of ensemble-docking calculations. Importantly, our approach performed best among the protocols exploiting only structural information of the apo protein to generate conformations of the receptor for ensemble-docking calculations.

Full PDF

1 Coupling enhanced sampling of the apo-receptor with template-based ligand conformers selection: Performance in pose prediction in the D3R Grand Challenge 4.

Andrea Basciu , Panagiotis I. Koukos , Giuliano Malloci , Alexandre M. J. J. Bonvin and Attilio V. Vargiu Dipartimento di Fisica, Università di Cagliari, Cittadella Universitaria, I-09042 Monserrato (CA), Italy Bijvoet Center for Biomolecular Research, Faculty of Science - Chemistry, Utrecht University, Padualaan 8, 3584 CH Utrecht, The Netherlands *Correspondence to: A. M. J. J. Bonvin: [email protected]; A. V. Vargiu: [email protected]

ORCIDs

G. Malloci: 0000-0002-5985-257X A. M. J. J. Bonvin: 0000-0001-7369-1322 A. V. Vargiu: 0000-0003-4013-8867

Abstract

The Drug Design Data Resource (D3R) 2019 Grand Challenge is the fourth iteration (GC4) of the major docking competition organized by the D3R consortium [1–3]. The competition has two main goals: i) assessing the ability of docking algorithms to accurately predict the binding poses of a protein against a diverse set of small molecules, and ii) evaluating of the performance of binding affinity predictors. The target in this iteration of the pose prediction assessment is the beta-site amyloid precursor protein cleaving enzyme 1 (BACE-1), a beta-secretase 1 protein [4]. BACE-1 plays an early role in Alzheimer's disease, as it is essential for the generation of the β-amyloid peptides composing the amyloid plaques which are the hallmark neuropathological lesions [5, 6]. Given its role in initiating the formation of β-amyloids, BACE-1 has been a critical target in developing therapies against the progression of Alzheimer’s disease [7], as testified also by the huge number of BACE-1 protein structures deposited in the Protein Data Bank (PDB) [8] at the beginning of the challenge (>300 on September 4 th Materials and Methods

The D3R GC4 is divided into a set of different stages, in which the participants are requested to predict the binding pose of a set of different ligands against the same receptor and to rank them and/or estimate their free energies of binding. We participated in stages 1a and 1b of the pose prediction challenge. Specifically, in stage 1a (cross-docking) we performed ensemble docking calculations on conformations of the protein and of the ligands generated by our methodology (vide infra), while in stage 1b (self-docking) the 10 conformers of each ligand were docked onto the conformations of BACE-1 extracted from the structures of their complexes with the 20 compounds in the dataset. For both challenges the participants were asked to generate a ranked set of maximum 5 poses for each ligand.

Data provided

In stage 1a the only data provided by the organizers consisted of a list of 20 SMILES entries (corresponding to the 20 compounds for which the participants must predict the crystallographic pose) and of the protein sequence in FASTA format. In stage 1b the experimental structures of the receptors for all 20 BACE-1 complexes were provided, to allow the participants to re-dock each ligand on the corresponding holo-conformation of the receptor. Ligand preparation

The 20 ligands were similar in size, each containing about 35 heavy atoms. For the generation of their conformers we employed the methodology featured in [16]. The protocol makes use of ligand similarities in the form of Tanimoto coefficients and can be summarized in the following steps: 1.

Identify existing highly homologous structures of the target protein bound to small molecules; 2.

Discard undesirable structures (e.g. low resolution, split side chains near the binding site, covalently bound ligands in the case where the target ligand is known to bind non-covalently, etc.); 3.

Calculate Tanimoto coefficients between all target and template ligands; 4.

Generate up to 500 ligand conformers for all targets with OpenEye OMEGA [19]; 5.

Select 10 conformers for all targets by comparing the generated conformers to the structure of the template ligand with the highest Tanimoto similarity per ligand, with OpenEye ROCS (shape and colour mode) [23].

Binding site determination

In order to identify the putative binding site of the protein, we used the same approach presented in [16]. Namely, we retrieved in the PDB all the structures featuring at least 95% sequence identity to the amino acid sequence provided by the organizers and having a co-crystallized ligand (other than crystallization buffer molecules); this resulted in 340 entries. We verified that the binding site was well characterized and perfectly conserved in all the structures with no missing residues in the putative pocket. Next, we used the Tanimoto metrics, as implemented in fmcsR [24] and chemmineR [25] packages to evaluate the similarity between the ligands present in the 340 entries and each of the 20 compounds provided, in order to identify a set of receptor templates featuring the most similar ligand to the compounds to be docked. Details on Tanimoto similarity measurement can be found in [16, 26]. The search for the structure featuring the most similar ligand to each of the 20 compounds resulted in 9 complex structures selected as templates (see Table S1). In addition to providing a metric to select ligand’s conformers for docking calculations, these structures were used to identify the residues lining the binding site to enhance its conformational sampling. This list was built by merging all the residues within 3.5 Å from the ligand in each of the 9 complex structures selected. Since this approach generated a very large number of residues (more than 30), we kept only the most conserved ones (appearing at least in 2 structures) and, among those present only in one structure, the most buried ones (likely to interfere with ligand binding). The resulting selection is shown in Fig. 1, which clearly highlights that the chosen list of 20 residues (see Table S2) surrounds all the 20 congeneric ligands provided for this challenge.

Unbiased and enhanced-sampling Molecular Dynamics (MD) simulations

In this work we applied essentially the same workflow presented in the original EDES publication (see [13] for details), with some modifications that are described in the following sections.

Starting protein structure.

We used the package BLASTP 2.7.1+ [27] to search for protein structures homologous to the sequence provided by the organizers. We searched the PDB, setting the number of alignments (- num_alignements ) to 1000 and the number of scoring evaluations (- evalue ) to 10, while using the default values otherwise. We also requested the structures to have the word "BACE" in their name. With these search criteria, we identified around 300 structures, which included only 8 apo proteins. Among these, we identified as template the structure with PDB ID 1SGZ [17], which was resolved at good resolution (2 Å), did not feature any missing residue, and displayed a full overlap with the BACE-1 sequence provided by the organizers. Prior to setting up the system, the structure was further refined through the MolProbity webserver [28].

Unbiased MD . Standard all-atom MD simulations of the apo protein (herafter MD apo ) embedded in a 0.15 KCl water solution (~60.000 atoms in total) and under periodic boundary conditions were carried out using the pmemd module of the AMBER18 package [29]. The initial distance between the protein and the edge of the box was set to be at least 16 Å in each direction. Topology files were created for each system using the

LEaP module of AmberTools18 starting from the apo structure with PDB ID 1SGZ. The AMBER-FB15 [30, 31] force field was used for the protein, the TIP3P-FB model was used for water, and the parameters for the ions were obtained from [32]. Long-range electrostatics was evaluated through the particle-mesh Ewald algorithm using a real-space cutoff of 12 Å and a grid spacing of 1 Å in each dimension. The van der Waals interactions were treated by a Lennard-Jones potential using a smooth cutoff (switching radius 10 Å, cutoff radius 12 Å). Multistep energy minimization with a combination of the steepest-descent and conjugate-gradient methods was carried out to relax internal constraints of the systems by gradually releasing positional restraints. Following this, the system was heated from 0 to 310 K in 10 ns of constant-pressure heating (

NPT ) using the Langevin thermostat (collision frequency of 1 ps –1 ) and the Berendsen barostat. After equilibration, a production run of 1  s was performed. A time step of 2 fs was used for pre-production runs, while equilibrium MD simulations were carried out with a time step of 4 fs in the NPT ensemble (using a MC barostat) after hydrogen mass repartitioning [33]. Coordinates from production trajectory were saved every 100 ps.

Enhanced sampling MD.

EDES aims to generate holo-like conformations of a protein exploiting only structural information on the apo counterpart. This is achieved by means of bias-exchange well-tempered metadynamics simulations [34, 35] on a set of generic CVs effectively biasing both the shape and the volume of the binding pocket. Ultimately, the method mimics induced fit rearrangements of the receptor due to ligand/protein interactions. Metadynamics simulations were performed on the apo protein using the GROMACS 2016.5 package [36] and the PLUMED 2.3.5 plugin [37]. Simulations were started from the last conformation sampled along the pre-production step of the unbiased MD. AMBER parameters were ported to GROMACS using the acpype parser [38]. Following the original implementation, four CVs defined considering only the residues within the binding site were used: i) the radius of gyration of the binding site (hereafter RoG BS ) calculated using the gyration built-in function of PLUMED; ii) the numbers of (pseudo)contacts across three orthogonal “inertia planes” (CIPs), calculated through a switching function implemented in the coordination keyword of PLUMED. The inertia planes are defined as the planes orthogonal to the three principal inertia axes of the binding site and passing through its geometrical center. All non-hydrogenous atoms were considered to define the CIPs, while only backbone atoms were used to estimate RoG BS , on which we also implemented a “windows” approach aimed to sample in a controlled manner different shapes of the binding site. Namely, we applied soft restraints at RoG BS values that are 7.5% higher and lower than the RoG BS of the X-ray apo structure (RoG X-rayapo ) and from that trajectory, corresponding to the first window, we randomly selected a conformation associated with a RoG BS value 5% lower than RoG X-rayapo . This structure was used as starting point for another MD simulation (corresponding to window 2) with walls centered at ±7.5% RoG

X-rayapo from this new center. We repeated the procedure until we generated 3 windows including the first one, centered at 9.91, 9.41, and 8.92 Å respectively (corresponding to a ~10% decrease of RoG

X-rayapo ). Each replica was simulated for 100 ns, leading to 400 ns of metadynamics simulations per window; coordinates were saved every 10 ps. Note that in [13] we demonstrated that EDES is not sensitive to the exact choice of the windows parameters. The height w of the Gaussian hills was set to 0.6 kcal/mol, while the widths s i of the Gaussian hills were set to 0.06, 2.6, 1.7 and 3.0 respectively for RoG BS and CIP , respectively. The bias factor for well-tempered metadynamics was set to 10. Hills were added every 2 ps, while the bias-exchange frequency was set to 20 ps. The force constants for the restraints on the RoG BS were set to 50 and 10 kcal  mol –1  Å –2 for the upper and lower walls respectively. Hereafter, we will refer to these simulations as EDES . Docking

Ensemble docking calculations were performed using either AutoDock4.2 [20] or the HADDOCK2.2 webserver [21, 22], following the same procedures described in [13]. However, at odd with the original implementation, calculations were here performed on 200 protein conformations obtained by merging structures extracted from both MD apo and EDES simulations. This approach appeared to be reasonable, as the extent of the conformational changes occurring at the binding site was unknown a priori . In particular, since we expected to observe significant oscillations of this region including the flap (Fig. 1) also along MD apo , including structures extracted from that trajectory should encompass a non-negligible fraction of structures with RoG BS larger than the value calculated on 1SGZ. In addition, the number of protein conformations was lowered to 200 to cope with the time constraints of the challenge, also considering that for each compound, 10 ligand conformations were employed in ensemble-docking runs. Protein conformations were extracted by means of a multi-step cluster analysis as described in [13], with the additional requirement to extract at least 10 cluster representatives from each of the 10 slices in which the RoG BS distributions were binned, so as to include a certain number of structures also from poorly sampled regions. The multi-step cluster analysis was applied separately to MD apo and EDES , extracting 500 clusters from each trajectory. Next, an additional cluster analysis using the same approach was performed on the pool of 1000 cluster representatives in order to generate the final ensemble of 200 conformations. For AutoDock, the docking was repeated with each selected conformer, while for HADDOCK the ensemble of conformers was submitted and used in a single docking run. We submitted four sets of protein-ligand binding pose predictions generated through the following docking protocols: 1. Autodock (receipt ID pe6zg) : Docking calculations and pose selection were performed following prescriptions detailed for Autodock in [13]. Briefly, for each ligand, 10 different conformers were docked on each of the 200 protein structures using the Lamarckian Genetic Algorithm (LGA). The grid density and number of energy evaluations were both increased from default values (respectively by decreasing the spacing parameter from 0.375 Å to 0.25 and by increasing the ga_num_evals parameter by a factor of 10) in order to avoid repeating each calculation several times to obtain converged results. An adaptive grid was used, enclosing all of the residues belonging to the binding site in each different protein conformation. Next, the top poses (in total 200, one for each docking run) were clustered using the cpptraj module of AmberTools18 with a hierarchical agglomerative algorithm. Namely, after structural alignment of the protein binding site conformations, the poses were clustered using a distance RMSD (dRMSD) cutoff d c =0.075∙N nh , where N nh is the number of non-hydrogenous atoms of the ligand. This choice was made in order to tune the cutoff to the molecular size of each compound. Finally, clusters were ordered according to the top score (lowest binding free energy) within each cluster. 2. HADDOCK (receipt ID kmtri) : Docking calculations and pose selection were performed following prescriptions detailed for HADDOCK in [13]. A single docking run was performed per case, starting from the various ensembles of 200 conformations, with increased sampling (10000/400/400 models for it0, it1 and wat steps, respectively referring to rigid-body docking, semiflexible and final refinement in explicit solvent). The weight of the intermolecular van der Waals energy used in it0 was increased to 1.0 (from the default value of 0.01), and RMSD-based clustering was selected with a cutoff of 1 Å. Docking was guided by ambiguous distance restraints defined for the residues of the binding site and the ligand. During it0 the protein binding site residues were defined as “active”, effectively drawing the ligand into the binding site without restraining its orientation. For the subsequent stages only the ligand was active, improving its exploration of the binding site while maintaining at least one contact with its residues. 3. Autodock with pose refinement and rescoring (herafter Autodock rr ; receipt ID nstab) : This approach is the same as in 1, with an additional step consisting in the relaxation of the top 10 docking poses by means of a multi-step structural relaxation performed with AMBER18 [29]. Systems were optimized in vacuum through three consecutive cycles of restrained structural relaxation (1000 cycles of steepest descent followed by up to 24000 cycles of conjugate gradients) followed by an unrestrained optimization (2000 cycles of steepest descent followed by up to 8000 cycles of conjugate gradients). During restrained relaxation harmonic forces of 0.3, 0.2, and 0.1 kcal  mol -1  Å -1 respectively for the first, second and third cycles were applied on all non-hydrogenous atoms of the system. Long-range electrostatics was evaluated directly using a cutoff of 99 Å, as for the Lennard-Jones potential. The AMBER-FB15 [30, 31] force field was used for the protein, while the parameters of the ligands were derived from the GAFF force field [39] using the antechamber module of AmberTools18. In particular, bond-charge corrections (bcc) charges were assigned to ligand atoms following structural relaxation under the Austin Model 1 (AM1) approximation. Finally, the poses were rescored using the same scoring function of AutoDock employed to rank the original docking poses. 4. HADDOCK all-Hs (receipt ID apue7) : this approach is the same as in 2, except for the inclusion of all hydrogens (and not only the polar ones) in atomic models. In addition to ensemble-docking calculations using receptor structures generated in silico , during stage 1b we also performed self-docking calculations (only with AutoDock), that is using for each ligand the conformation of the receptor extracted from the corresponding holo experimental structure released by the organizers at the beginning of stage 1b (protocol

Autodock self , receipt ID qb4hg ). All the remaining parameters were identical to those employed within the

Autodock protocol. Note that we also submitted predictions using the same template-based protocol described in [16] (

HADDOCK tb ; receipt ID nwm5a). Those led to the best performance from all our submissions but will not be discussed here. Druggability calculations

Following our previous study [13] we used the package f-pocket [40] to assess the druggability of the binding site within the ensembles of BACE-1 conformations generated by MD simulations. For each conformation, we evaluated the druggability score D [41] which ranges between 0 and 1 with higher values identifying more druggable geometries. It is customary to associate scores >0.5 to putative binding sites [41]. Results and discussion

In the following we report the performance of our protocols in predicting near-native binding poses of BACE-1 ligands (stage 1a). Next we report the performance of Autodock in self-docking calculations of the same ligands onto the experimental structure of the receptor released at stage 1b. Evaluations were performed according to the data downloaded from the D3R website (https://drugdesigndata.org/about/grand-challenge-4-evaluation-results). The accuracy of the poses was evaluated by calculating the RMSD of each ligand with respect to its experimental reference structure (considering only heavy atoms), after superposition of the binding interface areas. First, we discuss the performance in terms of both global and per ligand descriptors. Next, to identify possible sources of errors, we analyze the accuracy of our EDES-like approach in sampling holo-like and druggable conformations of BACE-1, and the performance of the template-based algorithm to generate ligand conformers [16]. We conclude by summarizing the possible drawbacks of the methodology and its unique features, and by listing possible routes of future development.

Stage 1a

Table 1 provides an overview of the performance of our methodologies in finding near-native poses of the 20 BACE-1 ligands. The best results were obtained with AutoDock [20] coupled to a multi-step structural optimization and rescoring of the top 10 poses (

Autodock rr ). Using this approach, we found median and average values of the RMSD calculated on the top pose of each ligand (hereafter RMSD med1 and 〈 RMSD 〉 ) lower than 2 Å and 3 Å respectively. Moreover, these values were lower than 1.5 Å and 2 Å when calculated on the nearest-native poses (hereafter RMSD medmin and 〈 RMSD min 〉 , respectively). The AMBER-based refinement led to significant improvements with respect to the “standard” Autodock protocol, for which we obtained values of RMSD medmin and 〈 RMSD min 〉 lower than 2.5 Å. A very similar performance was achieved by HADDOCK when using standard settings, while the explicit consideration of non-polar hydrogen atoms of the ligand during docking led to an appreciable drop in the accuracy. The 〈 RMSD min 〉 and 〈 RMSD 〉 metrics place our methods in the middle-left and middle-right regions of the histogram plot summarizing performances of all applicants (Fig. 2). An inspection of the protocols employed in this competition that perform better than any of ours (in terms of 〈 RMSD 〉 ) revealed that our methodology is (among those for which these details were disclosed) the only one based on ensemble-docking from protein conformations generated in silico from an apo experimental structure of BACE-1. Thus, no information on the structures of its complexes with similar ligands was exploited to bias the conformation of the binding site towards holo-like geometries. In order to investigate more in detail the performance of our method, and possibly to correlate the accuracy of our results to one or more relevant parameters, we report the results for each of the 20 BACE ligands in Table 2, as well as in Figs. 3a, 3b, 3c, and 3d for Autodock , Autodock rr , HADDOCK , and

HADDOCK all-Hs , respectively. The Table reveals that the aforementioned approaches gave at least one pose with RMSD lig < 2.5 Å respectively in 15, 16, 17, and 10 out of twenty cases, corresponding to success rates of 75%, 80%, 85%, and 50%. In the following, we will discuss in more detail only the top three approaches:

Autodock , Autodock rr , and HADDOCK . Inspection of Table 2 reveals that the most challenging ligand was BACE , the only one for which we obtained poses featuring an RMSD > 3 Å from the native conformation with all approaches. Additional challenging ligands include BACE , for which the best RMSD value was 2.8 Å (obtained with Autodock rr ), and to a minor extent BACE , BACE , BACE , BACE , and BACE , for which 1 out of the three protocols was unable to find poses with RMSD values lower than 2.5 Å. Inspection of Figs. 3a-b reveals the large variability in the orientation of docking poses for almost all of the ligands investigated. Such behavior, resulting in several poses displaying large RMSD values and in high standard deviations of the averages, was somewhat expected because of the (desired) tendency of our protocol to maximize the conformational diversity of the binding site among the protein structures used in docking calculations ( vide infra ). Clearly, considering the average RMSD over all poses would intrinsically penalize approaches like ours. Nonetheless, the best overall performing method (

Autodock rr ) demonstrated its ability to reproduce at least one near-native pose among the top 5 for virtually all ligands but BACE . Allowing some degree of flexibility of the ligands (e.g. by activating torsional angles in AutoDock) could in principle improve results for this and the most challenging cases, although improvements were reported to be system-dependent [42, 43]. This is confirmed by the comparison with results obtained using HADDOCK, which includes by default flexibility of both docking partners by means of short MD runs in the space of the torsional angles. Finally, we compared our predictions with docking calculations performed with Autodock on the experimental apo structure of BACE-1 (hereafter Autodock apo ). Table 1 confirms that, as expected, the lack of inclusion of protein flexibility has a major impact on the accuracy of near-native pose predictions [42–44]. In a further effort to identify most likely sources of errors we assessed the accuracies of our protocols in sampling near-native conformations of the protein and the ligands prior to docking calculations.

Sampling of holo-like (and druggable) conformations of BACE-1

The ability of our enhanced-sampling protocol to generate holo-like conformations of BACE-1 was evaluated in terms of the RMSD distributions calculated for the non-hydrogenous atoms of the binding site (hereafter RMSD BS ) with respect to each of the 20 BACE-1 experimental structures (provided at stage 1b) for MD apo , EDES and the ensemble of 200 cluster structures used in docking calculations. Fig. 4 shows that both MD apo and, to a larger extent, EDES , were able to generate a significant fraction of receptor conformations displaying RMSD BS values lower than 2 Å with respect to every ligand/BACE-1 experimental structure. The good performance of MD apo is consistent with the relatively small conformational rearrangements undergone by BACE-1 upon binding of all ligands (Fig. 1c), corresponding to a decrease of RoG BS in the range 9.10-9.44 Å from the initial value of 9.79 Å found in 1SGZ. In agreement with previous results [13], EDES performs better than MD apo as testified by the sizeable shoulder seen in Fig. 4b, raising the percentage of structures with RMSD BS < 1.5 Å. Moreover, our multi-step cluster analysis confirmed its tendency to select a large (even larger than that sampled along the MD trajectories) fraction of low-RMSD BS geometries with respect to all the experimental reference structures (Fig. 4 and Table 3). It is worth pointing out that in all cases we obtained some conformations of the binding site that are virtually identical to the experimental structures, as testified by the lowest RMSD BS values, all around 1 Å. We also evaluated the performance of our method in generating druggable conformations of BACE-1 [41]. Table 4 shows the results of this analysis on the 200 receptor conformations used for ensemble-docking calculations, compared with the values obtained for the targets investigated in [13]. It clearly appears that also the approach used in this work can generate a consistent fraction of structures associated with a large druggability score D . Generation of near-native ligand conformers

The performance of our template-based similarity protocol in generating near-native conformations of the 20 BACE-1 ligands is summarized in Table 3. This Table reports the statistics of the RMSD calculated on heavy atoms of each ligand after structural alignment on the reference conformation extracted from the experimental structure of the corresponding complex (hereafter indicated as RMSD lig-fit , to be distinguished from the same value calculated in the complex after alignment of the protein interface region). In all cases the minimum RMSD lig-fit values are lower than 2 Å, confirming the accuracy of the approach in reproducing near-native conformations [16]. Overall, these values are slightly larger than those obtained for the sampling of holo-like conformations of the receptor (2 nd and 3 rd columns in Table 3). Moreover, in 4 (1) out of twenty cases we obtained an average RMSD lig-fit > 2 (2.5) Å, and in 6 out of twenty cases we obtained values of RMSD lig-fitmin > 1.5 Å, and these ligands are (except for BACE and BACE ) exactly those for which we obtained the less accurate docking results. This further confirms previous findings on the importance of sampling not only the correct conformations of the protein but also that of the ligands, as well as the exponential impact on accuracy arising from the combination of even minimal displacements from the correct structures in both partners [16]. Stage1b

This is a self-docking stage for which the bound protein structures – but not those of the compounds – are known. The comparison between the data obtained at stages 1a and 1b is instructive to evaluate the performance of the docking protocol in the presence of the correct structures of the receptor. This enables to highlight drawbacks likely unrelated to the protein flexibility problem, as well as to confirm the importance of using a relevant fraction of correct conformers of the ligand in ensemble-docking calculations. We performed this exercise using AutoDock; namely, we docked each of the 20 ligands on the corresponding experimental structure of the receptor. Table 5 reports the overall performance of

Autodock self , while Fig. 3e shows a more detailed analysis of the top 5 poses for each BACE-1 ligand. Interestingly, a marginal improvement was seen with respect to the results obtained with the

Autodock protocol at stage 1a, while the success rate evaluated as the number of ligands for which at least one pose featured a value of the RMSD  〈 RMSD lig-ref 〉 and/or RMSD lig-refmin in Table 3. In particular, wrong poses were found for BACE , BACE , BACE , BACE , and BACE , while for BACE all poses have RMSD values close to 2.5 Å. While further relaxation and rescoring of these poses is expected to increase the success rate, we note that a very minor conformational change towards the correct geometry of the binding site was sufficient to find at least one pose with RMSD lower than 2 Å for BACE . Conclusions and Perspectives

We report the performance of our hybrid ensemble-docking approach in its first participation to a D3R Grand Challenge competition. The approach is founded on a template-based algorithm to select proper ligand conformers, and on our recently published EDES protocol (implemented here with small modifications with respect to the original version) to sample holo-like protein conformations starting from the apo one. EDES confirmed its excellent performance in sampling holo-like conformations of the protein (particularly at the binding site) for all of the twenty complexes formed between BACE-1 and the congeneric ligands subject of this study. A very good accuracy in reproducing near-native ligand conformers was achieved also by our template-based approach. These performances reflected in the relatively high accuracy in the prediction of near-native binding poses. Independently of the docking program used, our method was able to find near-native poses among the top 5 ones for at least 75% of the twenty complexes subject of the pose prediction sub-challenge. While HADDOCK found near-native poses for more targets than AutoDock, the latter featured the best overall performance when coupled to a computationally cheap post-docking relaxation of the poses. Performing docking calculations on the apo experimental structure of BACE-1 resulted in significantly less accurate predictions, due to the unaccounted rearrangements of the protein flap occurring upon ligand binding. Finally, the good performance of our approach was testified by the only slight overall improvement obtained when performing self-docking calculations on the twenty experimental holo receptor structures. Note that in this Grand Challenge we also submitted pose predictions following the template-based protocol described in [16] (

HADDOCK tb ) in which an ideal choice of both ligand and receptor conformations is made based on ligand similarity to known PDB entries. As in GC3, this approach led to an excellent performance with 〈 RMSD min 〉 , 〈 RMSD 〉 and 〈 RMSD 〉 respectively of 1.4 Å, 1.66 Å and 1.95 Å, which demonstrates that a template-based approach remains the best strategy when 3D structures of related complexes are available in the PDB. However, when this is not the case, our EDES approach appears to be an attractive alternative. Moreover, it has been proposed that including MD-generated receptor conformations (such as those obtained from EDES) in the virtual screening protocol could promote the discover of new active chemotypes, especially for flexible receptors, in which entirely new pocket conformations may be revealed for potential ligand binding [45–47]. Further developments of the method will include improved identification of putative binding sites and of their key residues, coupling of EDES with the use of co-solvents, and exploitation of different cluster methodologies. Acknowledgments

A.B. gratefully acknowledges the Sardinia Regional Government for the financial support of his Ph.D. scholarship (P.O.R. Sardegna F.SE., Operational Programme of the Autonomous Region of Sardinia, European Social Fund 2014– the Dutch Foundation for Scientific Research (NWO) (TOP-PUNT grant 718.015.001. Tables

Table 1.

Overall performance of our protocols in retrieving near-native ligands conformations of BACE-1 ligands (rows 4 to 8) during stage 1a, and performance of the

Autodock self protocol in stage 1b (last row; data from https://drugdesigndata.org). The values in parentheses are standard deviations of the averages.

Averages Median Protocol 〈 RMSD min 〉 〈 RMSD 〉 〈 RMSD 〉 RMSD medmin

RMSD med1

RMSD med

Stage 1a

Autodock rr Autodock

HADDOCK

HADDOCK all-Hs

Autodock apo

Stage 1b

Autodock self

Table 2.

Summary the docking results obtained with the four methods described in this work for each of the 20 BACE-1 ligands (data from https://drugdesigndata.org). RMSD min values larger than 2.5 Å are bolded.

Autodock Autodock rr HADDOCK HADDOCK all-Hs

Target Ligand RMSD min

RMSD 〈 RMSD 〉 RMSD min

RMSD 〈 RMSD 〉 BACE Table 3.

Performances of our methodology evaluated separately for the generation of protein and ligand conformations similar to those found in the ligand/BACE-1 experimental structures. The 2 nd column reports the lowest RMSD BS calculated across the 200 receptor conformations with respect to each experimental structure. The 3 rd column reports the percentage of conformations displaying an RMSD BS lower than 1.5 Å. The last column reports the minimum and maximum RMSD values (calculated on the non-hydrogenous atoms with respect to the structure of each ligand in the experimental structure), as well as the average and standard deviation within parentheses. Values of RMSD lig-refmin larger than 1.5 Å and average values of RMSD lig-ref larger than 2 Å are underlined and bolded, respectively. Protein Ligands System RMSD

BSmin [Å] % RMSD BS < 1.5 Å RMSD lig -ref [Å] BACE BACE BACE BACE BACE Table 4.

Performance of our approach in generating druggable conformations of the binding site. The percentages of structures featuring druggability scores D larger than 0.5 to 0.9 are reported in columns 2 to 6, respectively. % structures with D greater than 0.5 0.6 0.7 0.8 0.9 BACE BGT * RIC * ABP * *From [13]. BGT: T4 phage β-glucosyltransferase; RIC: recombinant ricin; ABP: allose binding protein. Table 5.

Summary of the self-docking results obtained with the

Autodock self protocol for each of the 20 BACE-1 ligands. RMSD min values larger than 2.5 Å are bolded.

Target Ligand RMSD min

RMSD 〈 RMSD 〉 BACE Figures

Fig. 1

Putative binding site identified on the BACE-1 apo protein (PDB ID 1SGZ [17]) for implementation of the EDES approach. a) Structure of the protein (grey ribbons) showing the sidechains of the 20 residues in Table 2 as sticks colored by type (polar, apolar, acidic and glycines in light green, magenta, red and white respectively); b) zoom on the putative binding site in a), showing in transparent sticks the experimental poses of the 20 ligands provided by the organizers (after superposition of common C  atoms on all proteins to 1SGZ); c) comparison between the apo structure of BACE-1 (grey ribbons) and the 20 ligand/BACE-1 complex structures (blue ribbons) released at stage 1b of the challenge. The magenta arrow highlights the major displacement undergone by the protein flap upon ligand binding. Fig. 2

Overall performance of the protocols employed in this study, as measured by the values of 〈 RMSD min 〉 and 〈 RMSD 〉 Fig. 3

Performance of the Autodock (a), Autodock rr (b), HADDOCK (c) and HADDOCK all-Hs (d) protocols in reproducing the near-native conformations of the 20 BACE-1 ligands. Green and grey panels refer to targets for which we obtained at least one pose within the top 5 featuring a value of the ligand RMSD ≤ 2.5 Å and ≤ 3 Å respectively, while orange boxes indicate cases for which no such poses were found among the top 5 ones Fig. 4

Normalized distributions (bin size = 0.1 Å) of RMSD BS calculated with respect to the 20 experimental structures of ligands in complex with BACE-1 for MD apo (upper graph) and EDES (middle panel) trajectories, as well as for the ensemble of 200 BACE-1 structures used in ensemble docking calculations (lower panel) References

1. Gaieb Z, Parks CD, Chiu M, et al (2019) D3R Grand Challenge 3: blind prediction of protein–ligand poses and affinity rankings. J Comput Aided Mol Des 33:1–18. https://doi.org/10.1007/s10822-018-0180-4 2. Gaieb Z, Liu S, Gathiaka S, et al (2018) D3R Grand Challenge 2: blind prediction of protein–ligand poses, affinity rankings, and relative binding free energies. J Comput Aided Mol Des 32:1–20. https://doi.org/10.1007/s10822-017-0088-4 3. Gathiaka S, Liu S, Chiu M, et al (2016) D3R grand challenge 2015: Evaluation of protein–ligand pose and affinity predictions. J Comput Aided Mol Des 30:651–668. https://doi.org/10.1007/s10822-016-9946-8 4. Venugopal C, Demos C, Jagannatha Rao K, et al (2008) Beta-Secretase: Structure, Function, and Evolution. CNS Neurol Disord - Drug Targets 7:278–294. https://doi.org/10.2174/187152708784936626 5. Cole SL, Vassar R (2007) The Alzheimer’s disease Beta-secretase enzyme, BACE1. Mol Neurodegener 2:22. https://doi.org/10.1186/1750-1326-2-22 6. Nalivaeva NN, Turner AJ (2013) The amyloid precursor protein: A biochemical enigma in brain development, function and disease. FEBS Lett 587:2046–2054. https://doi.org/10.1016/j.febslet.2013.05.010 7. Murphy MP, LeVine H (2010) Alzheimer’s Disease and the Amyloid-β Peptide. J Alzheimers Dis 19:311–323. https://doi.org/10.3233/JAD-2010-1221 8. Berman HM, Westbrook, John, Feng, Zukang, et al (2000) The Protein Data Bank. Nucleic Acids Res 28:235–242. https://doi.org/10.1093/nar/28.1.235 9. Prati F, Bottegoni G, Bolognesi ML, Cavalli A (2018) BACE-1 Inhibitors: From Recent Single-Target Molecules to Multitarget Compounds for Alzheimer’s Disease: Miniperspective. J Med Chem 61:619–637. https://doi.org/10.1021/acs.jmedchem.7b00393 10. Moussa CE-H (2017) Beta-secretase inhibitors in phase I and phase II clinical trials for Alzheimer’s disease. Expert Opin Investig Drugs 26:1131–1136. https://doi.org/10.1080/13543784.2017.1369527 11. Amaro RE, Baudry J, Chodera J, et al (2018) Ensemble Docking in Drug Discovery. Biophys J 114:2271–2278. https://doi.org/10.1016/j.bpj.2018.02.038 12. Huang S-Y, Zou X (2006) Ensemble docking of multiple protein structures: Considering protein structural variations in molecular docking. Proteins Struct Funct Bioinforma 66:399–421. https://doi.org/10.1002/prot.21214 13. Basciu A, Malloci G, Pietrucci F, et al (2019) Holo-like and Druggable Protein Conformations from Enhanced Sampling of Binding Pocket Volume and Shape. J Chem Inf Model 59:1515–1528. https://doi.org/10.1021/acs.jcim.8b00730 14. Zhao H, Caflisch A (2015) Molecular dynamics in drug design. Eur J Med Chem 91:4–14. https://doi.org/10.1016/j.ejmech.2014.08.004 15. De Vivo M, Masetti M, Bottegoni G, Cavalli A (2016) Role of Molecular Dynamics and Related Methods in Drug Discovery. J Med Chem 59:4035–4061. https://doi.org/10.1021/acs.jmedchem.5b01684 16. Koukos PI, Xue LC, Bonvin AMJJ (2019) Protein–ligand pose and affinity prediction: Lessons from D3R Grand Challenge 3. J Comput Aided Mol Des 33:83–91. https://doi.org/10.1007/s10822-018-0148-4 17. Hong L, Tang J (2004) Flap Position of Free Memapsin 2 (β-Secretase), a Model for Flap Opening in Aspartic Protease Catalysis † , ‡ . Biochemistry 43:4689–4695. https://doi.org/10.1021/bi0498252 18. Laio A, Parrinello M (2002) Escaping free-energy minima. Proc Natl Acad Sci 99:12562–12566. https://doi.org/10.1073/pnas.202427399

19. Hawkins PCD, Skillman AG, Warren GL, et al (2010) Conformer Generation with OMEGA: Algorithm and Validation Using High Quality Structures from the Protein Databank and Cambridge Structural Database. J Chem Inf Model 50:572–584. https://doi.org/10.1021/ci100031x 20. Morris GM, Huey R, Lindstrom W, et al (2009) AutoDock4 and AutoDockTools4: Automated docking with selective receptor flexibility. J Comput Chem 30:2785–2791. https://doi.org/doi:10.1002/jcc.21256 21. Dominguez C, Boelens R, Bonvin AMJJ (2003) HADDOCK: A Protein−Protein Docking Approach Based on Biochemical or Biophysical Information. J Am Chem Soc 125:1731–1737. https://doi.org/10.1021/ja026939x 22. van Zundert GCP, Rodrigues JPGLM, Trellet M, et al (2016) The HADDOCK2.2 Web Server: User-Friendly Integrative Modeling of Biomolecular Complexes. Comput Resour Mol Biol 428:720–725. https://doi.org/10.1016/j.jmb.2015.09.014 23. Hawkins PCD, Skillman AG, Nicholls A (2007) Comparison of Shape-Matching and Docking as Virtual Screening Tools. J Med Chem 50:74–82. https://doi.org/10.1021/jm0603365 24. Wang Y, Backman TWH, Horan K, Girke T (2013) fmcsR: mismatch tolerant maximum common substructure searching in R. Bioinformatics 29:2792–2794. https://doi.org/10.1093/bioinformatics/btt475 25. Cao Y, Charisi A, Cheng L-C, et al (2008) ChemmineR: a compound mining framework for R. Bioinformatics 24:1733–1734. https://doi.org/10.1093/bioinformatics/btn307 26. Kurkcuoglu Z, Koukos PI, Citro N, et al (2018) Performance of HADDOCK and a simple contact-based protein–ligand binding affinity predictor in the D3R Grand Challenge 2. J Comput Aided Mol Des 32:175–185. https://doi.org/10.1007/s10822-017-0049-y 27. Altschul SF, Gish W, Miller W, et al (1990) Basic local alignment search tool. J Mol Biol 215:403–410. https://doi.org/10.1016/S0022-2836(05)80360-2 28. Williams CJ, Headd JJ, Moriarty NW, et al (2018) MolProbity: More and better reference data for improved all-atom structure validation: PROTEIN SCIENCE.ORG. Protein Sci 27:293–315. https://doi.org/10.1002/pro.3330 29. Case, D.A., Ben-Shalom, I.Y., Brozell, S.R., et al AMBER18. University of California, San Francisco 30. Wang L-P, Martinez TJ, Pande VS (2014) Building Force Fields: An Automatic, Systematic, and Reproducible Approach. J Phys Chem Lett 5:1885–1891. https://doi.org/10.1021/jz500737m 31. Wang L-P, McKiernan KA, Gomes J, et al (2017) Building a More Predictive Protein Force Field: A Systematic and Reproducible Route to AMBER-FB15. J Phys Chem B 121:4023–4039. https://doi.org/10.1021/acs.jpcb.7b02320 32. Joung IS, Cheatham TE (2008) Determination of Alkali and Halide Monovalent Ion Parameters for Use in Explicitly Solvated Biomolecular Simulations. J Phys Chem B 112:9020–9041. https://doi.org/10.1021/jp8001614 33. Hopkins CW, Le Grand S, Walker RC, Roitberg AE (2015) Long-Time-Step Molecular Dynamics through Hydrogen Mass Repartitioning. J Chem Theory Comput 11:1864–1874. https://doi.org/10.1021/ct5010406 34. Barducci A, Bussi G, Parrinello M (2008) Well-Tempered Metadynamics: A Smoothly Converging and Tunable Free-Energy Method. Phys Rev Lett 100:020603. https://doi.org/10.1103/PhysRevLett.100.020603 35. Piana S, Laio A (2007) A Bias-Exchange Approach to Protein Folding. J Phys Chem B 111:4553–4559. https://doi.org/10.1021/jp067873l 36. Abraham MJ, Murtola T, Schulz R, et al (2015) GROMACS: High performance molecular simulations through multi-level parallelism from laptops to supercomputers. SoftwareX 1–2:19–25. https://doi.org/10.1016/j.softx.2015.06.001 37. Tribello GA, Bonomi M, Branduardi D, et al (2014) PLUMED 2: New feathers for an old bird. Comput Phys Commun 185:604–613. https://doi.org/10.1016/j.cpc.2013.09.018

38. Sousa da Silva AW, Vranken WF (2012) ACPYPE - AnteChamber PYthon Parser interfacE. BMC Res Notes 5:367. https://doi.org/10.1186/1756-0500-5-367 39. Wang J, Wolf RM, Caldwell JW, et al (2004) Development and testing of a general amber force field. J Comput Chem 25:1157–1174 40. Le Guilloux V, Schmidtke P, Tuffery P (2009) Fpocket: An open source platform for ligand pocket detection. BMC Bioinformatics 10:168. https://doi.org/10.1186/1471-2105-10-168 41. Schmidtke P, Barril X (2010) Understanding and Predicting Druggability. A High-Throughput Method for Detection of Drug Binding Sites. J Med Chem 53:5858–5867. https://doi.org/10.1021/jm100574m 42. Antunes DA, Devaurs D, Kavraki LE (2015) Understanding the challenges of protein flexibility in drug design. Expert Opin Drug Discov 10:1301–1313. https://doi.org/10.1517/17460441.2015.1094458 43. Du X, Li Y, Xia Y-L, et al (2016) Insights into Protein–Ligand Interactions: Mechanisms, Models, and Methods. Int J Mol Sci 17:144. https://doi.org/10.3390/ijms17020144 44. Buonfiglio R, Recanatini M, Masetti M (2015) Protein Flexibility in Drug Discovery: From Theory to Computation. ChemMedChem 10:1141–1148. https://doi.org/doi:10.1002/cmdc.201500086 45. Tarcsay Á, Paragi G, Vass M, et al (2013) The Impact of Molecular Dynamics Sampling on the Performance of Virtual Screening against GPCRs. J Chem Inf Model 53:2990–2999. https://doi.org/10.1021/ci400087b 46. Li YY, An J, Jones SJ (2011) A computational approach to finding novel targets for existing drugs. PLoS Comput Biol 7:e1002139 47. Amaro RE, Li WW (2010) Emerging methods for ensemble-based virtual screening. Curr Top Med Chem 10:3–13 Supplementary Tables

Table S1

Ligand templates structures. For each target compound (first column), we report the template PDB ID (second column) and the Tanimoto similarity coefficient between the two ligands (third column). The Tanimoto coefficient ranges from 0 (lowest similarity) to 1 (highest similarity).

Ligand target Template PDB ID Tanimoto similarity

BACE Table S2

List of residues defining the putative binding site of BACE-1 ligands investigated in this work, along with their occurrence frequencies in the list of residues within 3.5 Å of the ligands in the experimental structures with PDB IDs 3DV1, 4DPI, 2IQG, 3DV5, 3K5C, 3VEU, 4KE1, 2B8L, 4R92, 6BFD.

Residue Occurrence

Q121 T329