Antimalarial Artefenomel Inhibits Human SARS-CoV-2 Replication in Cells while Suppressing the Receptor ACE2
Tania Massignan, Alberto Boldrini, Luca Terruzzi, Giovanni Spagnolli, Andrea Astolfi, Valerio Bonaldo, Francesca Pischedda, Massimo Pizzato, Graziano Lolli, Maria Letizia Barreca, Emiliano Biasini, Pietro Faccioli, Lidia Pieri
IIdentification of a Druggable Intermediate along the
Folding Pathway of the SARS-CoV-2 Receptor ACE2
Alberto Boldrini , Luca Terruzzi , Giovanni Spagnolli , Andrea Astolfi , Tania Massignan ,Graziano Lolli , Maria Letizia Barreca , Emiliano Biasini , Pietro Faccioli & Lidia Pieri Sibylla Biotech SRL, Verona VR, Italy Department of Cellular, Computational and Integrative Biology (CIBIO) and Dulbecco TelethonInstitute, University of Trento, Trento TN, Italy Department of Pharmaceutical Sciences, University of Perugia, Perugia PG, Italy Department of Physics, University of Trento and INFN-TIFPA, Trento, Italy*Correspondence: [email protected]
Abstract
The steep climbing of victims caused by the new coronavirus disease 2019 (COVID-19) throughoutthe planet is sparking an unprecedented effort to identify effective therapeutic regimens to tackle thepandemic. The SARS-CoV-2 virus is known to gain entry into various cell types through the bindingof one of its surface proteins (spike) to the host Angiotensin Converting Enzyme 2 (ACE2). Thus,the spike-ACE2 interaction represents a major target for vaccines and antiviral drugs. We recentlydescribed a novel method to pharmacologically down-regulate the expression of target proteins at thepost-translational level. This technology builds on computational advancements in the simulation offolding mechanisms to rationally block protein expression by targeting folding intermediates, hencehampering the folding process. Here, we report the all-atom simulation of the entire sequence of eventsunderlying the folding pathway of ACE2. Our data reveal the existence of a folding intermediate showingdruggable pockets hidden in the native conformation. Both pockets were targeted by a virtual screeningrepurposing campaign aimed at quickly identifying drugs capable to decrease the expression of ACE2.Importantly, among the different virtual hits, we identified mefloquine, a quinoline-derivative belongingto a class of antimalaria agents (e.g. chloroquine and hydroxychloroquine) recently described for theireffects on ACE2 maturation. Thus, our results suggest that these drugs could act against SARS-CoV-2by altering the folding pathway of its receptor ACE2.
Introduction
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is the causative agent of the viralpneumonia outbreak named Coronavirus Disease 2019 (COVID-19), a pandemic started at the end of2019 in the Hubei province of China and later spread to the rest of the planet [1]. The disease showsunprecedented transmission, morbidity and mortality rates [2, 3]. With vaccines still far from beingavailable, effective therapeutics to control the disease are in urgent need. SARS-CoV-2 belongs tothe broad family of Coronaviridae, positive-sense, single-stranded RNA viruses of humans and animalscausing pathologies ranging from a common cold to severe respiratory diseases, such as the Severe AcuteRespiratory Syndrome (SARS) and the Middle East Respiratory Syndrome (MERS) [4]. Similarlyto other coronaviruses, SARS-CoV-2 has four main structural proteins, known as the nucleocapsid(N), membrane (M), envelope (E) and spike (S) [5]. The latter promotes the first adhesion of thevirus to host cells through direct interaction with extracellular domains of the angiotensin-convertingenzyme 2 (ACE2) [6-8]. Subsequently, another protein of the host, the transmembrane protease serine 2(TMPRSS2), cleaves the spike protein, exposing a fusion peptide that directly promotes virus entry intothe cell [9]. ACE2 belongs to the protein family of angiotensin-converting enzymes [10]. The protein is1 a r X i v : . [ q - b i o . B M ] A p r eported to be expressed in cells of the kidney, intestine, arteries, heart, and lung, although discrepanciesabout its expression profiles in different organs recently emerged [11, 12]. The main function of ACE2is connected to its ability to lower blood pressure by catalyzing the conversion of angiotensin II intothe vasodilator angiotensin [13]. The protein is a promising drug target for treating cardiovasculardiseases [14, 15]. Consistent with the mode of entry of SARS-CoV-2 into human cells, blocking theinitial interaction of Spike with ACE2 should effectively counteract virus replication. This objectivecould be achieved by different strategies, including antibodies against Spike, compounds interferingwith the Spike-ACE2 interaction, or molecules capable of lowering the exposure of ACE2 at the cellsurface [16]. The latter strategy could overcome possible evolutionary variations of the virus conferringresistance to treatments directed against viral proteins. However, the function of ACE2 on vascularregulation has been shown to play protective roles against virus-induced lung injury, and the completesilencing of the protein would likely result in severe secondary effects [6, 17]. For this reason, anypharmacological approach aimed at targeting ACE2 should be able to down-regulate its expression ratherthan abrogating it. We have recently described an entirely novel method for selectively reducing thelevel of target proteins, called Pharmacological Protein Inactivation by Folding Intermediate Targeting(PPI-FIT) [18]. The technology is based on the concept of targeting folding intermediates of proteinsrather than native conformations and is made possible by computational algorithms allowing the all-atom reconstruction of folding pathways [18, 19]. The rationale underlying the PPI-FIT method is thatstabilizing a folding intermediate of a protein with small ligands should promote its degradation bythe cellular quality control machinery, which could recognize such artificially stabilized intermediates asimproperly folded species. Here, we report the full atomistic reconstruction of the folding pathway ofACE2, which predicts the existence of a folding intermediate presenting two unique druggable pocketshidden into the native state. These pockets were employed as novel target sites for virtual screeningcampaigns aimed at repurposing compounds capable of modulating the expression of ACE2. Results
All-Atom Reconstruction of the ACE2 Folding Pathway
To obtain the all-atom reconstruction of the folding pathway of apo-ACE2, we employed the so-calledBias Functional approach (see Methods) [20]. This enhanced path sampling scheme requires to providethe protein native conformation, which was retrieved from PDB 1R42 (Figure 1).
Figure 1:
Structure of the catalytic domain of ACE2.
A. Cartoon representation of the ACE2 catalytic domainretrieved from PDB 1R42. α -helices are depicted in blue, while β -sheets are represented in red. B. ACE2 cartoonrepresentation from two different perspectives. Folding regions are color-coded as follows: N-terminal region (residues 19-82), blue; central core (residues 149-284 and 430-587), cyan; globular domain coordinating the zinc ion (residues 295-425),magenta; C-terminal tail (residues 589-615), red. R defined as a normalized linear combination of the fraction ofnative contacts (Q) and Root Mean Squared Deviation (RMSD) of atomic positions from the nativestructure. According to our definition, R = 0 corresponds to the fully denatured state of the protein,while R = 1 to the native configuration. The 18 resulting least biased trajectories were employed togenerate the transition path energy function G ( R ) defined in Methods. This quantity provides a lowerbound to reaction limiting free-energy barriers that are overcome along the folding process. Therefore,it can be used to infer the existence of folding intermediates. Since the bias functional approach cannotbe used to explore the dynamics within the reactant state, we restricted our analysis to the transitionregion R > 0.55, where significant tertiary-structure content begins to appear. This reactive section ofthe pathway contains two non-native metastable states, referred to as "early" and "late" intermediates(Figure 2). Figure 2:
Transition path energy profile of the ACE2 folding pathway.
Transition path energy of the ACE2folding free energy profile, computed as G ( R ) = − ln P ( R ) . The reaction coordinate R is a normalized linear combinationof the collective variables Q and RMSD. The filled curve represents the standard deviation of the profile, computed usingjackknife resampling. The graph shows the presence of two intermediate states (early and late) in addition to the nativestate. The analysis of protein conformations along the trajectories led to the description of the complete foldingmechanism for ACE2 (Figure 3). The first event along the ACE2 folding pathway is the formation ofthree distinct regions (foldons) encompassing: (i) the two N-terminal α -helices (residues 19-82); (ii) theglobular domain (residues 295-425) containing the zinc coordinating amino acids; (iii) the central core(residues 149-284 and 430-587), followed by the docking of the C-terminal tail (residues 589-615). Theseregions then sequentially fold onto each other, giving rise to the native state.3 igure 3: Simplified representation of ACE2 folding pathway.
The folding pathway of ACE2 begins with theindependent formation of the 3 main foldons (1): the N-terminal region (residues 19-82), depicted in blue; the central core(residues 149-284 and 430-587), represented in cyan; the globular domain coordinating the zinc ion (residues 295-425).Subsequently, the globular domain docks onto the central core (2), followed by the docking of the N-terminal region andthe formation of the C-terminal helix (residues 589-598) (3), which then dock on the central core (4), giving rise to thenative structure.
Indentification of ACE2 Folding Intermediates
A k-mean clustering was performed to obtain representative protein structures from the two non-nativefree energy wells (i.e. early and late intermediates) appearing along the folding pathways (Supp. Figure1). Four cluster centers per well were selected as representative conformations (Figure 4). The earlyintermediate is characterized by the lack of docking of the three foldons, which may also appear as notcompletely structured in some conformations. Conversely, the late ACE2 folding intermediate differsfrom the native conformation mainly for the topology of the two N-terminal α -helices, which are notdocked to the rest of the structure in all the conformations.4 igure 4 : Representative folding intermediates conformations.
Representative protein conformations corre-sponding to the cluster centers of the early (A) and late (B) intermediates. The frequency of the least biased trajectories(T) corresponding to the cluster represented by each protein conformation is shown below each structure. Domains arecolored as follows: N-terminal region (residues 19-82), blue; central core (residues 149-284 and 430-587), cyan; globulardomain coordinating the zinc ion (residues 295-425), magenta; C-terminal tail (residues 589-615), red.
Such a high degree of structural organization and low conformational variability represent ideal featuresof a folding intermediate, allowing the reliable identification of druggable sites. Thus, the late foldingintermediate of ACE2 was further scouted for the presence of pockets suitable for virtual screeningcampaigns.
Identification of Druggable Pockets on the ACE2 Late Intermediate.
To account for minor variations among the structures belonging to the same cluster, we selected anadditional protein conformation for each cluster of the late intermediate (Supp. Figure 2). This processyielded a final number of 8 protein conformations, which were then used for the subsequent pocketdetection step. These structures were subjected to 50 ns of MD simulations at T = 310 K with positionrestrain on C α , to explore the conformational variability of the residues side-chains. Then, for eachMD simulation, 200 structures, equispaced in time, were extracted and analyzed with the SiteMap [22]and DogSiteScorer [23] software. Pockets were then ranked based on a series of druggability descriptors(Supp. Table 1). This analysis identified two predicted druggable pockets, exclusively present in theACE2 late intermediate state but not in the native conformation (Figure 5).5 igure 5 : Druggable pockets identified in the intermediate state.
A. Druggable pocket 1. Panel I shows thesurface of the ACE2 folding intermediate (blue), while the surface of the druggable pocket is shown in magenta. Thesuperimposed structure of the native state is depicted as a green cartoon. In the native state, the structure composed byresidues 475-492 covers the binding pocket. In panel II, the intermediate structure is represented as a light blue cartoon,while the volume of the druggable site is colored in magenta. In panel III, the residues forming the identified site aredepicted in magenta: 127, 130, 144, 152, 159, 160, 161, 163, 167, 168, 171, 172, 173, 174, 176, 230, 237, 241, 261, 262,264, 265, 266, 267, 268, 269, 270, 271, 272, 275, 448, 451, 452, 454, 455, 456, 459, 464, 497, 498, 499, 500, 502, 503. B.Druggable Pocket 2. Panel I represents the surface of the intermediate state in blue, while the surface of the druggablepocket is shown in magenta. The superimposed structure of the native state is depicted as a green cartoon. In the nativestate, the structure composed by residues 598-603 covers the binding pocket. In panel II, the intermediate structure isrepresented as a light blue cartoon, while the volume of the druggable site is colored in magenta. In panel III, residuesforming the identified site are depicted in magenta: 236, 239, 240, 242, 243, 247, 248, 281, 282, 283, 284, 285, 286, 436,440, 443, 591, 592, 593, 594, 596, 597, 600.
In particular, pocket 1 results from a local structural variation into the core region (residues 468-498),while pocket 2 results from the missing docking of the C-terminal tail onto the core region. To determinethe reliability of each druggable site, we measured the number of folding pathways presenting the relevantstructural variation underlying each pocket (Figure 6).
Figure 6 : Analysis of folding mechanism yielding to pockets formation.
A. The RMSD of the region composedby residues 468-498, which hides pocket 1 in the native structure, is plotted as a function of the RMSD of its docking regionon the central core (residues 144-171, 221-283, 442-467, 499-519). In 1/18 trajectories these two regions are distant enoughto allow the formation of pocket 1 (RMSD res: 468-498 > 15 Å, RMSD docking region < 4 Å), while in the remainingtrajectories they fold cooperatively (15/18) or non-cooperatively (2/18), in both cases not leading to the formation ofpocket 1. B. The RMSD of the C-terminal tail (residues 589-615) is plotted as a function of the RMSD of its docking onthe central core (residues 221-265, 432-465, 513-533). Pocket 2 appears in all the trajectories (9/18) in which the dockingregion is structured before the attachment of the C-terminal tail (RMSD CTT > 20 Å, RMSD docking region < 4 Å).
In silico identification of potential binders of ACE2 intermediate
The identification of potential ACE2 folding intermediate ligands was pursued by employing a drugrepositioning strategy. We built a unique collection of 9187 compounds by combining libraries of drugsapproved by the U.S. Food and Drug Administration (FDA) and molecules at different stages of currentlyongoing clinical trials (see Material and Methods). The chemical collection was screened against thetwo identified pockets by following a consensus virtual screening workflow (Figure 7A). Two differentdocking software, Glide [22] and LeadIT [24], were employed in parallel to predict the binding affinity ofeach compound to the ACE2 folding intermediate pockets.Two different docking software, Glide [22] andLeadIT [24], were employed in parallel to predict the binding affinity of each compound to the ACE2folding intermediate pockets. Only compounds showing promising predicted affinity (i.e. Glide ds ≤ -6kcal/mol; LeadIT HYDE aff ≤ µ M) in both docking protocols were submitted to a third docking roundbased on AutoDock [25]. This process identified two consensus sets (AD
LBE ≤ -6 kcal/mol, AD NiC ≥ ds ≤ -9 kcal/mol) and LeadIT (HYDE aff ≤ µ M) were also added to these sets. Finally, a visualinspection of binding mode and chemical similarity annotation for each ligand allowed the selection of14 virtual hits for pocket 1 and additional 21 for pocket 2 (Supp. Table 2).
Figure 7 : Virtual Screening.
A. Schematic of the virtual screening workflow employed for drug repositioning. B. Two-dimensional ligand interaction scheme (B) and three-dimensional binding pose (C) for the interaction of mefloquine withthe pocket 2 of the ACE2 folding intermediate. Purple arrows indicate H-bonds; blue-red line indicates the salt-bridge.
Discussion
Multiple pieces of evidence indicate that down-regulating the expression of ACE2 in the early stagesof the SARS-CoV2 infection should effectively inhibit virus replication. However, selectively decreasingthe expression of a target protein could be a difficult task. RNA silencing or CRISPR-based strategiesrepresent valid options, but their use could be limited by delivery issues [26, 27]. These problemscould be overcome by emerging pharmacological technologies like the proteolysis targeting chimeras(PROTACs), which build on the principle of designing bi-functional compounds capable of interactingwith the target protein with one side and engaging the E3 ubiquitin ligase with the other, leading to thedegradation of the polypeptide by the proteasome [28]. Similarly, the PPI-FIT method capitalizes onthe cellular quality control machinery to promote the degradation of the target polypeptide, althoughit does not require the development of bi-functional molecules. PPI-FIT-derived compounds aim atstimulating the removal of the target protein by directly blocking its folding pathway [18]. In thismanuscript, we described the first step along with the application of the PPI-FIT paradigm to ACE2by reporting the all-atom reconstruction of the folding pathway of the protein. Our analyses predictthe existence of a folding intermediate showing two unique druggable pockets not present in the nativeACE2 conformation. In order to respond to the urgent need for an effective therapy against SARS-CoV-2, we targeted both pockets by a in silico virtual screening approach aimed at repurposing compoundscurrently in clinical trials or already approved by the FDA. Interestingly, among the predicted ligands forpocket 2 we found the quinoline-derivative mefloquine. This drug belongs to a class of antimalaria agentsrecently described for their potential effect against SARS-CoV-2 [29, 30]. While the precise mechanismby which chloroquine and its more active derivative hydroxychloroquine inhibit virus replication is notknown, reports suggest that the compounds may act by reducing the glycosylation of ACE2 [31]. Thisevidence is highly consistent with a PPI-FIT-like effect of these drugs, which could alter the correctmaturation of ACE2 glycosidic moieties by acting on the folding pathway of the protein. Based onour previous experience with the PPI-FIT technology, we anticipate that the possible down-regulatingeffects of our in silico predicted compounds targeting the ACE2 folding intermediate will be easily testedin cell-based assays by mean of standard biochemical techniques like western blotting.
Methods
Software & Resources
The simulations were performed on the high-performance-computing facilities of the Italian Institute forNuclear Physics (INFN) and on the computer cluster of Sibylla Biotech. A modified version of Gromacs2019 [32] was employed to run the calculations. Data analysis was carried out using Python and itslibraries: NumPy, SciPy, Matplotlib and MDAnalysis.
Structure & Topology
The structure of ACE2 was retrieved from PDB 1R42. The file contains the catalytic domain (residues19-615) solved by X-ray crystallography at 2.2 Å of resolution. Protein topology was generated usingCharmm36m after removal of the catalytic zinc ion and sugar moieties. Cysteines were treated in thereduced state. Water molecules were modeled using the Charmm-modified TIP3P.8 eneration and Selection of Unfolded Initial Conditions
The ACE2 structure was positioned in a cubic box with minimum wall distance equal to 60 Å. Thesystem was filled with water molecules, neutralized with 28 Na + ions and brought to a final 150 mMNaCl concentration. This setup was followed by an energy minimization using the steepest descentalgorithm. An NVT equilibration at 800 K was carried out using the V-rescale thermostat. Then,so-called ratchet and pawl (rMD) molecular dynamics simulations [33, 34] were employed to obtain thedenatured states. In this method, an external biasing force is added, acting along a single collectivevariable, denoted as z ( X ) : z ( X ) = N (cid:88) | i − j | > (cid:2) C ij ( X ) − C ij ( X R ) (cid:3) (1)where C ij ( X ) is an entry of the instantaneous contact map and C ij ( X R ) is an entry of the referencecontact map, both defined as: C ij ( X ) = 1 − ( | x i − x j | /r ) − ( | x i − x j | /r ) (2)where x i and x j are the coordinates of the i th and j th atoms of the protein; while r is the referencecontact-distance that is set to 7.5 Å. In applications of rMD to protein denaturation, all the entriesof the reference contact map were set to 0. In applications of rMD to protein folding simulations,the reference contact map C ij ( X R ) is calculated from Eq. (2) using the protein’s native conformation,obtained by performing energy minimization starting from the PDB structure. In any rMD simulation,the system evolves according to plain MD as long as the value of z ( x ) spontaneously decreases duringtime. Instead, system fluctuations leading to an increase in z ( X ) result in the activation of the biasingforce defined as: F i ( X, z m ) = (cid:40) − k R ∇ i z ( X )[ z ( X ) − z m ( t )] if z ( X ) > z m ( t )0 if z ( X ) ≤ z m ( t ) (3)where z m ( t ) is the minimum value assumed by z ( X ) up to time t , and the index i indicates the atomon which the force is acting. To obtain the denatured protein conformations, 32 rMD trajectories of 1ns each with a force constant k R = 8 · − kJ/mol. These simulations were followed by 4 ns of plainMD at 800 K in the NVT ensemble. For each plain MD unfolding trajectory, the protein conformationminimizing the following function was selected: S ( X ) = (cid:80) n − mi =0 (cid:80) nj = i + m ( j − i ) max (1 − r − (cid:107) x i − x j (cid:107) , (cid:80) n − mi =0 (cid:80) nj = i + m ( j − i ) (4)where n is the total number of atoms, m is the number of ignored subsequent atoms and r is aparameter, set to 40 Å. This metrics is a norm of the contact maps, where atoms with distant indexesweight more in the calculation. Conformations with low value of S ( X ) are characterized by high degreeof denaturation, vice-versa, conformations with high value of S ( X ) are less denatured. Generation of the Folding Trajectories
Selected initial conditions (N = 32) were positioned in a cubic box with minimum wall-distance of10 Å. Each system was filled with water molecules, neutralized with 28 Na + ions and brought to afinal 150 mM NaCl concentration. Energy minimization was subsequently performed using the steepestdescent algorithm. Then, each system was subjected to 500 ps of NVT equilibration at 350 K (usingthe V-rescale thermostat [35]), followed by 500 ps of NPT equilibration at 350 K and 1 Bar (using theV-rescale thermostat and the Parrinello-Rahman barostat). During equilibration, position restraintswith force constant 1000 kJ · mol − nm − where introduced on heavy atoms. For each initial condition,40 rMD trajectories were generated, each one consisting in 5 · steps with integration time-step of2 fs. In this set of rMD simulations, reference contact map entering Eq. 1 was calculated using the9ative structure of ACE2. Cutoff for Coulomb interactions was set to 12 Å, cutoff for Van der Waalsinteractions was set to 12 Å with force-switch having 10 Å radius. Long range electrostatics was treatedwith particle mesh Ewald. Trajectories reaching a configuration with an RMSD lower than 4 Å wereconsidered productive folding events. For each set of folding trajectories (starting from the same initialcondition) the pathway with the highest probability to occur in absence of external bias was identifiedby selecting the one minimizing the Bias Functional [20], defined as: T = N (cid:88) i =1 m i γ i (cid:90) t dτ | F rMDi ( X, τ ) | (5)Where m i and γ i are the mass and the friction coefficient of the i th atom, F rMDi is the force acting onit, while t is the trajectory folding time. Reconstruction of the Transition Path Energy Profile
The collective variables Q and RMSD were linearly combined and normalized to obtain a one-dimensionalreaction coordinate, R : R = 11 . (cid:104) . (cid:16) − RM SD − RM SD min
RM SD max (cid:17) + 0 . (cid:16) Q − Q min Q max − Q min (cid:17)(cid:105) (6)From the frequency histogram of the least biased trajectories we estimated the probability P ( R ) toobserve a given value of the collective variable. We refer to the quantity G ( R ) = − ln [ P ( R )] as to the“transition path energy”. In a biased dynamics, G ( R ) yields a lower-bound to the rate limiting free-energy barriers, thus provides a useful tool to identify metastable states. The standard deviation σ wasestimated through a jackknifing procedure, in particular, the free energy profile was computed for 18times by leaving a different trajectory out in each calculation. For each bin, the standard deviation σ was calculated as: σ = (cid:118)(cid:117)(cid:117)(cid:116) n − n n (cid:88) i =1 (cid:0) ˆ θ i − ˆ θ ( · ) (cid:1) (7)where n is the total number of leas biased trajectories, ˆ θ i is the mean value of the jackknife replicate(where the ith trajectory was removed) and ˆ θ ( · ) is the mean of the jackknife replicates. Clustering Analysis
Protein conformations were sampled from the two identified energy wells, in particular, the early-intermediate was defined with 0.59 < R < 0.68, while the late-intermediate was defined with 0.797 < R < 0.865 (Supp. Figure 1). The two sets of structures were then subjected to k-mean clustering ( k = 4), using the Frobenius norm of the contact maps as a distance metrics. Then, for each cluster, asingle representative conformation was selected by choosing the structure minimizing its distance fromthe relative cluster center. Identification of the Binding Pockets
The 4 cluster centers of the late intermediate and 4 additional conformations (each one extracted as acluster variant) were positioned in a cubic box with minimum wall-distance of 10 Å, that was filled withwater molecules. The system was neutralized with 28 Na + ions and brought to a final 150 mM NaClconcentration. Energy minimization was subsequently performed, using the steepest descent algorithm.Then, each system was equilibrated for 500 ps in the NVT ensemble at 310 K (using the V-rescalethermostat), followed by 500 ps of NPT equilibration at 310 K and 1 Bar (using the V-rescale thermostatand the Parrinello-Rahman barostat). During equilibration, position restraints with force constant 1000kJ · mol − nm − where introduced on heavy atoms. Subsequently, 50 ns of MD were performed for eachstructure by retaining the positional restraints on the C α of the protein backbone. For each trajectory,1000 frames, equally spaced in time, were extracted. Each frame was analyzed by means of SiteMap [36]and DogSiteScorer [23] software in order to identify druggable pockets. We considered as druggable apocket falling within the following thresholds: volume ≥
300 Å ; depth ≥
10 Å; balance ≥ ≤ ≥ ≥ ≥ ≥ ≥ Preparation of the Virtual Library
The FDA-approved drug library included compounds from the following chemical collections: Selleck(accession date 21/03/2020, 2684 compounds), Prestwick (accession date 21/03/2020, 1520 compounds)and eDrug-3D (accession date 21/03/2020, 1930 compounds). Compounds from the DrugBank collec-tion (DrugBank Release Version 5.1.5; 1784 molecules), which includes approved, experimental andinvestigational drugs, were added. The four chemical libraries were merged using an in-house developedKNIME workflow to obtain a unique library of non-redundant entries (total of 9187 compounds). Thefinal collection was prepared with the LigPrep tool [37]. Ionization/tautomeric states were generatedat pH range 7-8 using Epik [38]. Furthermore, at most 32 stereoisomers per ligand and three lowestenergy conformations per ligand ring were produced. Where not defined, all the chiral form of eachstereocenter was produced. In total, 12771 docking clients were generated.
Virtual Screening Workflow
For the Glide docking [22], the N- and C- terminal residues of the ACE2 intermediate were cappedwith acetyl (ACE) and N-methylamine (NMA) groups, respectively, using the Schrödinger ProteinPreparation Wizard. The capped protein structure was used to generate the receptor grid, with noscaling of Van der Waals radii for non-polar receptor atoms. The docking space was centered on thecentroid of the residues defining the pockets (x: 75.4, y: 58.2, z: 67.6 for pocket 1; x: 50.8, y: 57.0, z:69.1 for pocket 2). The docking space was defined as a 35 Å cubic box, while the diameter midpoint ofdocked ligands was required to remain within a smaller, nested 15 Å cubic box. Docking experimentswere performed in the Glide standard precision mode using a 0.8 factor to scale the Van der Waals radii ofthe ligand atoms with partial atomic charge less than 0.15. For the BioSolveIT docking, LeadIT (version3.2.0) was used for protein preparation and docking parameters definition. The binding site was definedon the basis of the residues composing the identified druggable pocket. The residue protonation states,as well as the tautomeric forms, were automatically assessed in LeadIT using the ProToss method,that generates the most probable hydrogen positions on the basis of an optimal hydrogen bondingnetwork using an empirical scoring function. The virtual screening workflow was developed by using theKNIME analytic platform and the BioSolveIT KNIME nodes. Specifically, the workflow was organizedas follows: (i) the Compute LeadIT Docking node was selected to perform the docking simulations ofthe ∼ · docking clients by using the FlexX algorithm [39]. Ten poses for each ligand were produced;(ii) the Assess Affinity with HYDE in SeeSAR node generated refined binding free energy and HYDE[40] predicted activity (HYDE aff ) for each ligand pose using the HYDE rescoring function; and (iii)for each ligand, the pose with the lowest HYDE aff was extracted. For AutoDock docking, ligands andreceptor structures were converted to the AD4 format using AutoDockTools and the Gasteiger-Marsiliempirical atomic partial charges were assigned. The dimensions of the grid were 60 X 60 X 60 points,with grid points separated by a 0.375 Å. The grid was centered on the centroids of pockets 1 and2. The Lamarckian genetic algorithm was used, and for each compound, the docking simulation wascomposed of 100 runs. Clustering of docked conformations was performed on the basis of their RMSD(tolerance = 2.0 Å) and the results were ranked based on the estimated free energy of binding. Theobtained poses were filtered based on the Glide docking score (Glide ds ), the HYDE aff and AutoDockligand binding energy (AD LBE ) as well as on the number of clusters (AD
NiC ). The consensus virtual11creening workflow was applied for both pocket 1 and pocket 2 (Figure 7A). The best scored structuresfor the three software were visually inspected and selected based on their binding mode and predictedinteractions. In addition, the top scoring compounds for Glide docking and SeeSAR rescoring [41] werealso evaluated. In particular, the Glide set was generated following these criteria: Glide ds ≤ -9 kcal/mol,giving rise to 97 molecules for pocket 1 and 30 for pocket 2. The SeeSAR set was instead defined byapplying the following filters: (i) HYDE aff ≤ µ M; (ii) Ligand efficiency (LE) and lipophilic ligandefficiency (LLE) category from 1 to 4 (where 1 corresponds to favorable LE/LLE, 5 corresponds tounfavorable LE/LLE); (iii) Good ligand conformer quality, judged on the basis of torsional quality ofthe rescored pose rotamers and intramolecular clashes; (iv) no protein-ligand intermolecular clashes,giving rise to 89 molecules for pocket 1 and 205 molecules for pocket 2.
Supplementary Materials
Atomic resolution structures of the folding intermediates together with the corresponding druggablepockets are freely available upon request.
Acknowledgements
The computational reconstruction of ACE2 folding pathway was performed thanks to the strong supportof the high performance computational infrastructures made available by Italian National Instituteof Nuclear Physics (INFN). This work was supported by a grant from Fondazione Telethon (Italy,TCP14009). GS is a recipient of a fellowship from Fondazione Telethon. EB is an Assistant TelethonScientist at the Dulbecco Telethon Institute.
Author Contribution
Conceived and designed the analyses: all authorsPerformed the analyses: AB, LT, GS, AAAnalyzed the data: AB, LT, GS, AA, MLB and PFWrote the paper: GS, PF, AA, MLB and EBAll authors edited the manuscript
Competing Interests eferences upplementary Figures
Supp. Figure 1 : Representation of the selected regions defined by the energy wells.
The selected regionsdefining the two energy wells are depicted in orange. In particular, the early-intermediate region is defined with R between0.59 and 0.68, while the late-intermediate region is defined with R between 0.797 and 0.865.
Supp. Figure 2 : Selected structure variants for each cluster.
Representative conformations of the additionalstructures selected for each cluster. upplementary Tables Supp. Table 1 : Druggability descriptors of the identified pockets.
The table shows the values of the druggabilitydescriptors computed using SiteMap and DogSiteScorer for the two pockets. The threshold describes the value requiredby a descriptor for the definition of a good druggable site. upp. Table 2 : Predicted ligands for pockets 1 and 2 of the ACE2 folding intermediate.
The table showsthe different hits emerging from the virtual screening as well as the mefloquine analogues analyzed subsequently. ColumnSet indicates whether the molecule was identified by consensus (C), Glide (G) or SeeSAR (S) schemes; while (Q) indicatesthe quinoline derivatives. Affinity predictors calculated with the different software are also reported: HYDE aff (LeadIT),Glide ds (Glide), AD LBE and AD
NiC (AutoDock).(AutoDock).