[PDF] Antimalarial Artefenomel Inhibits Human SARS-CoV-2 Replication in Cells while Suppressing the Receptor ACE2

Abstract

The steep climbing of victims caused by the new coronavirus disease 2019 (COVID-19) throughout the planet is sparking an unprecedented effort to identify effective therapeutic regimens to tackle the pandemic. The SARS-CoV-2 virus is known to gain entry into various cell types through the binding of one of its surface proteins (spike) to the host Angiotensin-Converting Enzyme 2 (ACE2). Thus, spike-ACE2 interaction represents a major target for vaccines and antiviral drugs. A novel method has been recently described by some of the authors to pharmacologically downregulate the expression of target proteins at the post-translational level. This technology builds on computational advancements in the simulation of folding mechanisms to rationally block protein expression by targeting folding intermediates, hence hampering the folding process. Here, we report the all-atom simulations of the entire sequence of events underlying the folding pathway of ACE2. Our data revealed the existence of a folding intermediate showing two druggable pockets hidden in the native conformation. Both pockets were targeted by a virtual screening repurposing campaign aimed at quickly identifying drugs capable to decrease the expression of ACE2. We identified four compounds capable of lowering ACE2 expression in Vero cells in a dose-dependent fashion. All these molecules were found to inhibit the entry into cells of a pseudotyped retrovirus exposing the SARS-CoV-2 spike protein. Importantly, the antiviral activity has been tested against live SARS-CoV-2 (MEX-BC2/2020 strain). One of the selected drugs (Artefenomel) could completely prevent cytopathic effects induced by the presence of the virus, thus showing antiviral activity against SARS-CoV-2. Ongoing studies are further evaluating the possibility of repurposing these drugs for the treatment of COVID-19.

Full PDF

IIdentiﬁcation of a Druggable Intermediate along the

Folding Pathway of the SARS-CoV-2 Receptor ACE2

Alberto Boldrini , Luca Terruzzi , Giovanni Spagnolli , Andrea Astolﬁ , Tania Massignan ,Graziano Lolli , Maria Letizia Barreca , Emiliano Biasini , Pietro Faccioli & Lidia Pieri Sibylla Biotech SRL, Verona VR, Italy Department of Cellular, Computational and Integrative Biology (CIBIO) and Dulbecco TelethonInstitute, University of Trento, Trento TN, Italy Department of Pharmaceutical Sciences, University of Perugia, Perugia PG, Italy Department of Physics, University of Trento and INFN-TIFPA, Trento, Italy*Correspondence: [email protected]

Abstract

The steep climbing of victims caused by the new coronavirus disease 2019 (COVID-19) throughoutthe planet is sparking an unprecedented eﬀort to identify eﬀective therapeutic regimens to tackle thepandemic. The SARS-CoV-2 virus is known to gain entry into various cell types through the bindingof one of its surface proteins (spike) to the host Angiotensin Converting Enzyme 2 (ACE2). Thus,the spike-ACE2 interaction represents a major target for vaccines and antiviral drugs. We recentlydescribed a novel method to pharmacologically down-regulate the expression of target proteins at thepost-translational level. This technology builds on computational advancements in the simulation offolding mechanisms to rationally block protein expression by targeting folding intermediates, hencehampering the folding process. Here, we report the all-atom simulation of the entire sequence of eventsunderlying the folding pathway of ACE2. Our data reveal the existence of a folding intermediate showingdruggable pockets hidden in the native conformation. Both pockets were targeted by a virtual screeningrepurposing campaign aimed at quickly identifying drugs capable to decrease the expression of ACE2.Importantly, among the diﬀerent virtual hits, we identiﬁed meﬂoquine, a quinoline-derivative belongingto a class of antimalaria agents (e.g. chloroquine and hydroxychloroquine) recently described for theireﬀects on ACE2 maturation. Thus, our results suggest that these drugs could act against SARS-CoV-2by altering the folding pathway of its receptor ACE2.

Introduction

Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is the causative agent of the viralpneumonia outbreak named Coronavirus Disease 2019 (COVID-19), a pandemic started at the end of2019 in the Hubei province of China and later spread to the rest of the planet [1]. The disease showsunprecedented transmission, morbidity and mortality rates [2, 3]. With vaccines still far from beingavailable, eﬀective therapeutics to control the disease are in urgent need. SARS-CoV-2 belongs tothe broad family of Coronaviridae, positive-sense, single-stranded RNA viruses of humans and animalscausing pathologies ranging from a common cold to severe respiratory diseases, such as the Severe AcuteRespiratory Syndrome (SARS) and the Middle East Respiratory Syndrome (MERS) [4]. Similarlyto other coronaviruses, SARS-CoV-2 has four main structural proteins, known as the nucleocapsid(N), membrane (M), envelope (E) and spike (S) [5]. The latter promotes the ﬁrst adhesion of thevirus to host cells through direct interaction with extracellular domains of the angiotensin-convertingenzyme 2 (ACE2) [6-8]. Subsequently, another protein of the host, the transmembrane protease serine 2(TMPRSS2), cleaves the spike protein, exposing a fusion peptide that directly promotes virus entry intothe cell [9]. ACE2 belongs to the protein family of angiotensin-converting enzymes [10]. The protein is1 a r X i v : . [ q - b i o . B M ] A p r eported to be expressed in cells of the kidney, intestine, arteries, heart, and lung, although discrepanciesabout its expression proﬁles in diﬀerent organs recently emerged [11, 12]. The main function of ACE2is connected to its ability to lower blood pressure by catalyzing the conversion of angiotensin II intothe vasodilator angiotensin [13]. The protein is a promising drug target for treating cardiovasculardiseases [14, 15]. Consistent with the mode of entry of SARS-CoV-2 into human cells, blocking theinitial interaction of Spike with ACE2 should eﬀectively counteract virus replication. This objectivecould be achieved by diﬀerent strategies, including antibodies against Spike, compounds interferingwith the Spike-ACE2 interaction, or molecules capable of lowering the exposure of ACE2 at the cellsurface [16]. The latter strategy could overcome possible evolutionary variations of the virus conferringresistance to treatments directed against viral proteins. However, the function of ACE2 on vascularregulation has been shown to play protective roles against virus-induced lung injury, and the completesilencing of the protein would likely result in severe secondary eﬀects [6, 17]. For this reason, anypharmacological approach aimed at targeting ACE2 should be able to down-regulate its expression ratherthan abrogating it. We have recently described an entirely novel method for selectively reducing thelevel of target proteins, called Pharmacological Protein Inactivation by Folding Intermediate Targeting(PPI-FIT) [18]. The technology is based on the concept of targeting folding intermediates of proteinsrather than native conformations and is made possible by computational algorithms allowing the all-atom reconstruction of folding pathways [18, 19]. The rationale underlying the PPI-FIT method is thatstabilizing a folding intermediate of a protein with small ligands should promote its degradation bythe cellular quality control machinery, which could recognize such artiﬁcially stabilized intermediates asimproperly folded species. Here, we report the full atomistic reconstruction of the folding pathway ofACE2, which predicts the existence of a folding intermediate presenting two unique druggable pocketshidden into the native state. These pockets were employed as novel target sites for virtual screeningcampaigns aimed at repurposing compounds capable of modulating the expression of ACE2. Results

All-Atom Reconstruction of the ACE2 Folding Pathway

To obtain the all-atom reconstruction of the folding pathway of apo-ACE2, we employed the so-calledBias Functional approach (see Methods) [20]. This enhanced path sampling scheme requires to providethe protein native conformation, which was retrieved from PDB 1R42 (Figure 1).

Figure 1:

Structure of the catalytic domain of ACE2.

A. Cartoon representation of the ACE2 catalytic domainretrieved from PDB 1R42. α -helices are depicted in blue, while β -sheets are represented in red. B. ACE2 cartoonrepresentation from two diﬀerent perspectives. Folding regions are color-coded as follows: N-terminal region (residues 19-82), blue; central core (residues 149-284 and 430-587), cyan; globular domain coordinating the zinc ion (residues 295-425),magenta; C-terminal tail (residues 589-615), red. R deﬁned as a normalized linear combination of the fraction ofnative contacts (Q) and Root Mean Squared Deviation (RMSD) of atomic positions from the nativestructure. According to our deﬁnition, R = 0 corresponds to the fully denatured state of the protein,while R = 1 to the native conﬁguration. The 18 resulting least biased trajectories were employed togenerate the transition path energy function G ( R ) deﬁned in Methods. This quantity provides a lowerbound to reaction limiting free-energy barriers that are overcome along the folding process. Therefore,it can be used to infer the existence of folding intermediates. Since the bias functional approach cannotbe used to explore the dynamics within the reactant state, we restricted our analysis to the transitionregion R > 0.55, where signiﬁcant tertiary-structure content begins to appear. This reactive section ofthe pathway contains two non-native metastable states, referred to as "early" and "late" intermediates(Figure 2). Figure 2:

Transition path energy proﬁle of the ACE2 folding pathway.

Transition path energy of the ACE2folding free energy proﬁle, computed as G ( R ) = − ln P ( R ) . The reaction coordinate R is a normalized linear combinationof the collective variables Q and RMSD. The ﬁlled curve represents the standard deviation of the proﬁle, computed usingjackknife resampling. The graph shows the presence of two intermediate states (early and late) in addition to the nativestate. The analysis of protein conformations along the trajectories led to the description of the complete foldingmechanism for ACE2 (Figure 3). The ﬁrst event along the ACE2 folding pathway is the formation ofthree distinct regions (foldons) encompassing: (i) the two N-terminal α -helices (residues 19-82); (ii) theglobular domain (residues 295-425) containing the zinc coordinating amino acids; (iii) the central core(residues 149-284 and 430-587), followed by the docking of the C-terminal tail (residues 589-615). Theseregions then sequentially fold onto each other, giving rise to the native state.3 igure 3: Simpliﬁed representation of ACE2 folding pathway.

The folding pathway of ACE2 begins with theindependent formation of the 3 main foldons (1): the N-terminal region (residues 19-82), depicted in blue; the central core(residues 149-284 and 430-587), represented in cyan; the globular domain coordinating the zinc ion (residues 295-425).Subsequently, the globular domain docks onto the central core (2), followed by the docking of the N-terminal region andthe formation of the C-terminal helix (residues 589-598) (3), which then dock on the central core (4), giving rise to thenative structure.

Indentiﬁcation of ACE2 Folding Intermediates

A k-mean clustering was performed to obtain representative protein structures from the two non-nativefree energy wells (i.e. early and late intermediates) appearing along the folding pathways (Supp. Figure1). Four cluster centers per well were selected as representative conformations (Figure 4). The earlyintermediate is characterized by the lack of docking of the three foldons, which may also appear as notcompletely structured in some conformations. Conversely, the late ACE2 folding intermediate diﬀersfrom the native conformation mainly for the topology of the two N-terminal α -helices, which are notdocked to the rest of the structure in all the conformations.4 igure 4 : Representative folding intermediates conformations.

Representative protein conformations corre-sponding to the cluster centers of the early (A) and late (B) intermediates. The frequency of the least biased trajectories(T) corresponding to the cluster represented by each protein conformation is shown below each structure. Domains arecolored as follows: N-terminal region (residues 19-82), blue; central core (residues 149-284 and 430-587), cyan; globulardomain coordinating the zinc ion (residues 295-425), magenta; C-terminal tail (residues 589-615), red.

Such a high degree of structural organization and low conformational variability represent ideal featuresof a folding intermediate, allowing the reliable identiﬁcation of druggable sites. Thus, the late foldingintermediate of ACE2 was further scouted for the presence of pockets suitable for virtual screeningcampaigns.

Identiﬁcation of Druggable Pockets on the ACE2 Late Intermediate.

To account for minor variations among the structures belonging to the same cluster, we selected anadditional protein conformation for each cluster of the late intermediate (Supp. Figure 2). This processyielded a ﬁnal number of 8 protein conformations, which were then used for the subsequent pocketdetection step. These structures were subjected to 50 ns of MD simulations at T = 310 K with positionrestrain on C α , to explore the conformational variability of the residues side-chains. Then, for eachMD simulation, 200 structures, equispaced in time, were extracted and analyzed with the SiteMap [22]and DogSiteScorer [23] software. Pockets were then ranked based on a series of druggability descriptors(Supp. Table 1). This analysis identiﬁed two predicted druggable pockets, exclusively present in theACE2 late intermediate state but not in the native conformation (Figure 5).5 igure 5 : Druggable pockets identiﬁed in the intermediate state.

A. Druggable pocket 1. Panel I shows thesurface of the ACE2 folding intermediate (blue), while the surface of the druggable pocket is shown in magenta. Thesuperimposed structure of the native state is depicted as a green cartoon. In the native state, the structure composed byresidues 475-492 covers the binding pocket. In panel II, the intermediate structure is represented as a light blue cartoon,while the volume of the druggable site is colored in magenta. In panel III, the residues forming the identiﬁed site aredepicted in magenta: 127, 130, 144, 152, 159, 160, 161, 163, 167, 168, 171, 172, 173, 174, 176, 230, 237, 241, 261, 262,264, 265, 266, 267, 268, 269, 270, 271, 272, 275, 448, 451, 452, 454, 455, 456, 459, 464, 497, 498, 499, 500, 502, 503. B.Druggable Pocket 2. Panel I represents the surface of the intermediate state in blue, while the surface of the druggablepocket is shown in magenta. The superimposed structure of the native state is depicted as a green cartoon. In the nativestate, the structure composed by residues 598-603 covers the binding pocket. In panel II, the intermediate structure isrepresented as a light blue cartoon, while the volume of the druggable site is colored in magenta. In panel III, residuesforming the identiﬁed site are depicted in magenta: 236, 239, 240, 242, 243, 247, 248, 281, 282, 283, 284, 285, 286, 436,440, 443, 591, 592, 593, 594, 596, 597, 600.

In particular, pocket 1 results from a local structural variation into the core region (residues 468-498),while pocket 2 results from the missing docking of the C-terminal tail onto the core region. To determinethe reliability of each druggable site, we measured the number of folding pathways presenting the relevantstructural variation underlying each pocket (Figure 6).

Figure 6 : Analysis of folding mechanism yielding to pockets formation.

A. The RMSD of the region composedby residues 468-498, which hides pocket 1 in the native structure, is plotted as a function of the RMSD of its docking regionon the central core (residues 144-171, 221-283, 442-467, 499-519). In 1/18 trajectories these two regions are distant enoughto allow the formation of pocket 1 (RMSD res: 468-498 > 15 Å, RMSD docking region < 4 Å), while in the remainingtrajectories they fold cooperatively (15/18) or non-cooperatively (2/18), in both cases not leading to the formation ofpocket 1. B. The RMSD of the C-terminal tail (residues 589-615) is plotted as a function of the RMSD of its docking onthe central core (residues 221-265, 432-465, 513-533). Pocket 2 appears in all the trajectories (9/18) in which the dockingregion is structured before the attachment of the C-terminal tail (RMSD CTT > 20 Å, RMSD docking region < 4 Å).

In silico identiﬁcation of potential binders of ACE2 intermediate

The identiﬁcation of potential ACE2 folding intermediate ligands was pursued by employing a drugrepositioning strategy. We built a unique collection of 9187 compounds by combining libraries of drugsapproved by the U.S. Food and Drug Administration (FDA) and molecules at diﬀerent stages of currentlyongoing clinical trials (see Material and Methods). The chemical collection was screened against thetwo identiﬁed pockets by following a consensus virtual screening workﬂow (Figure 7A). Two diﬀerentdocking software, Glide [22] and LeadIT [24], were employed in parallel to predict the binding aﬃnity ofeach compound to the ACE2 folding intermediate pockets.Two diﬀerent docking software, Glide [22] andLeadIT [24], were employed in parallel to predict the binding aﬃnity of each compound to the ACE2folding intermediate pockets. Only compounds showing promising predicted aﬃnity (i.e. Glide ds ≤ -6kcal/mol; LeadIT HYDE aﬀ ≤ µ M) in both docking protocols were submitted to a third docking roundbased on AutoDock [25]. This process identiﬁed two consensus sets (AD

LBE ≤ -6 kcal/mol, AD NiC ≥ ds ≤ -9 kcal/mol) and LeadIT (HYDE aﬀ ≤ µ M) were also added to these sets. Finally, a visualinspection of binding mode and chemical similarity annotation for each ligand allowed the selection of14 virtual hits for pocket 1 and additional 21 for pocket 2 (Supp. Table 2).

Figure 7 : Virtual Screening.

A. Schematic of the virtual screening workﬂow employed for drug repositioning. B. Two-dimensional ligand interaction scheme (B) and three-dimensional binding pose (C) for the interaction of meﬂoquine withthe pocket 2 of the ACE2 folding intermediate. Purple arrows indicate H-bonds; blue-red line indicates the salt-bridge.

Discussion

Multiple pieces of evidence indicate that down-regulating the expression of ACE2 in the early stagesof the SARS-CoV2 infection should eﬀectively inhibit virus replication. However, selectively decreasingthe expression of a target protein could be a diﬃcult task. RNA silencing or CRISPR-based strategiesrepresent valid options, but their use could be limited by delivery issues [26, 27]. These problemscould be overcome by emerging pharmacological technologies like the proteolysis targeting chimeras(PROTACs), which build on the principle of designing bi-functional compounds capable of interactingwith the target protein with one side and engaging the E3 ubiquitin ligase with the other, leading to thedegradation of the polypeptide by the proteasome [28]. Similarly, the PPI-FIT method capitalizes onthe cellular quality control machinery to promote the degradation of the target polypeptide, althoughit does not require the development of bi-functional molecules. PPI-FIT-derived compounds aim atstimulating the removal of the target protein by directly blocking its folding pathway [18]. In thismanuscript, we described the ﬁrst step along with the application of the PPI-FIT paradigm to ACE2by reporting the all-atom reconstruction of the folding pathway of the protein. Our analyses predictthe existence of a folding intermediate showing two unique druggable pockets not present in the nativeACE2 conformation. In order to respond to the urgent need for an eﬀective therapy against SARS-CoV-2, we targeted both pockets by a in silico virtual screening approach aimed at repurposing compoundscurrently in clinical trials or already approved by the FDA. Interestingly, among the predicted ligands forpocket 2 we found the quinoline-derivative meﬂoquine. This drug belongs to a class of antimalaria agentsrecently described for their potential eﬀect against SARS-CoV-2 [29, 30]. While the precise mechanismby which chloroquine and its more active derivative hydroxychloroquine inhibit virus replication is notknown, reports suggest that the compounds may act by reducing the glycosylation of ACE2 [31]. Thisevidence is highly consistent with a PPI-FIT-like eﬀect of these drugs, which could alter the correctmaturation of ACE2 glycosidic moieties by acting on the folding pathway of the protein. Based onour previous experience with the PPI-FIT technology, we anticipate that the possible down-regulatingeﬀects of our in silico predicted compounds targeting the ACE2 folding intermediate will be easily testedin cell-based assays by mean of standard biochemical techniques like western blotting.

Methods

Software & Resources

The simulations were performed on the high-performance-computing facilities of the Italian Institute forNuclear Physics (INFN) and on the computer cluster of Sibylla Biotech. A modiﬁed version of Gromacs2019 [32] was employed to run the calculations. Data analysis was carried out using Python and itslibraries: NumPy, SciPy, Matplotlib and MDAnalysis.

Structure & Topology

The structure of ACE2 was retrieved from PDB 1R42. The ﬁle contains the catalytic domain (residues19-615) solved by X-ray crystallography at 2.2 Å of resolution. Protein topology was generated usingCharmm36m after removal of the catalytic zinc ion and sugar moieties. Cysteines were treated in thereduced state. Water molecules were modeled using the Charmm-modiﬁed TIP3P.8 eneration and Selection of Unfolded Initial Conditions

The ACE2 structure was positioned in a cubic box with minimum wall distance equal to 60 Å. Thesystem was ﬁlled with water molecules, neutralized with 28 Na + ions and brought to a ﬁnal 150 mMNaCl concentration. This setup was followed by an energy minimization using the steepest descentalgorithm. An NVT equilibration at 800 K was carried out using the V-rescale thermostat. Then,so-called ratchet and pawl (rMD) molecular dynamics simulations [33, 34] were employed to obtain thedenatured states. In this method, an external biasing force is added, acting along a single collectivevariable, denoted as z ( X ) : z ( X ) = N (cid:88) | i − j | > (cid:2) C ij ( X ) − C ij ( X R ) (cid:3) (1)where C ij ( X ) is an entry of the instantaneous contact map and C ij ( X R ) is an entry of the referencecontact map, both deﬁned as: C ij ( X ) = 1 − ( | x i − x j | /r ) − ( | x i − x j | /r ) (2)where x i and x j are the coordinates of the i th and j th atoms of the protein; while r is the referencecontact-distance that is set to 7.5 Å. In applications of rMD to protein denaturation, all the entriesof the reference contact map were set to 0. In applications of rMD to protein folding simulations,the reference contact map C ij ( X R ) is calculated from Eq. (2) using the protein’s native conformation,obtained by performing energy minimization starting from the PDB structure. In any rMD simulation,the system evolves according to plain MD as long as the value of z ( x ) spontaneously decreases duringtime. Instead, system ﬂuctuations leading to an increase in z ( X ) result in the activation of the biasingforce deﬁned as: F i ( X, z m ) = (cid:40) − k R ∇ i z ( X )[ z ( X ) − z m ( t )] if z ( X ) > z m ( t )0 if z ( X ) ≤ z m ( t ) (3)where z m ( t ) is the minimum value assumed by z ( X ) up to time t , and the index i indicates the atomon which the force is acting. To obtain the denatured protein conformations, 32 rMD trajectories of 1ns each with a force constant k R = 8 · − kJ/mol. These simulations were followed by 4 ns of plainMD at 800 K in the NVT ensemble. For each plain MD unfolding trajectory, the protein conformationminimizing the following function was selected: S ( X ) = (cid:80) n − mi =0 (cid:80) nj = i + m ( j − i ) max (1 − r − (cid:107) x i − x j (cid:107) , (cid:80) n − mi =0 (cid:80) nj = i + m ( j − i ) (4)where n is the total number of atoms, m is the number of ignored subsequent atoms and r is aparameter, set to 40 Å. This metrics is a norm of the contact maps, where atoms with distant indexesweight more in the calculation. Conformations with low value of S ( X ) are characterized by high degreeof denaturation, vice-versa, conformations with high value of S ( X ) are less denatured. Generation of the Folding Trajectories

Selected initial conditions (N = 32) were positioned in a cubic box with minimum wall-distance of10 Å. Each system was ﬁlled with water molecules, neutralized with 28 Na + ions and brought to aﬁnal 150 mM NaCl concentration. Energy minimization was subsequently performed using the steepestdescent algorithm. Then, each system was subjected to 500 ps of NVT equilibration at 350 K (usingthe V-rescale thermostat [35]), followed by 500 ps of NPT equilibration at 350 K and 1 Bar (using theV-rescale thermostat and the Parrinello-Rahman barostat). During equilibration, position restraintswith force constant 1000 kJ · mol − nm − where introduced on heavy atoms. For each initial condition,40 rMD trajectories were generated, each one consisting in 5 · steps with integration time-step of2 fs. In this set of rMD simulations, reference contact map entering Eq. 1 was calculated using the9ative structure of ACE2. Cutoﬀ for Coulomb interactions was set to 12 Å, cutoﬀ for Van der Waalsinteractions was set to 12 Å with force-switch having 10 Å radius. Long range electrostatics was treatedwith particle mesh Ewald. Trajectories reaching a conﬁguration with an RMSD lower than 4 Å wereconsidered productive folding events. For each set of folding trajectories (starting from the same initialcondition) the pathway with the highest probability to occur in absence of external bias was identiﬁedby selecting the one minimizing the Bias Functional [20], deﬁned as: T = N (cid:88) i =1 m i γ i (cid:90) t dτ | F rMDi ( X, τ ) | (5)Where m i and γ i are the mass and the friction coeﬃcient of the i th atom, F rMDi is the force acting onit, while t is the trajectory folding time. Reconstruction of the Transition Path Energy Proﬁle

The collective variables Q and RMSD were linearly combined and normalized to obtain a one-dimensionalreaction coordinate, R : R = 11 . (cid:104) . (cid:16) − RM SD − RM SD min

RM SD max (cid:17) + 0 . (cid:16) Q − Q min Q max − Q min (cid:17)(cid:105) (6)From the frequency histogram of the least biased trajectories we estimated the probability P ( R ) toobserve a given value of the collective variable. We refer to the quantity G ( R ) = − ln [ P ( R )] as to the“transition path energy”. In a biased dynamics, G ( R ) yields a lower-bound to the rate limiting free-energy barriers, thus provides a useful tool to identify metastable states. The standard deviation σ wasestimated through a jackkniﬁng procedure, in particular, the free energy proﬁle was computed for 18times by leaving a diﬀerent trajectory out in each calculation. For each bin, the standard deviation σ was calculated as: σ = (cid:118)(cid:117)(cid:117)(cid:116) n − n n (cid:88) i =1 (cid:0) ˆ θ i − ˆ θ ( · ) (cid:1) (7)where n is the total number of leas biased trajectories, ˆ θ i is the mean value of the jackknife replicate(where the ith trajectory was removed) and ˆ θ ( · ) is the mean of the jackknife replicates. Clustering Analysis

Protein conformations were sampled from the two identiﬁed energy wells, in particular, the early-intermediate was deﬁned with 0.59 < R < 0.68, while the late-intermediate was deﬁned with 0.797 < R < 0.865 (Supp. Figure 1). The two sets of structures were then subjected to k-mean clustering ( k = 4), using the Frobenius norm of the contact maps as a distance metrics. Then, for each cluster, asingle representative conformation was selected by choosing the structure minimizing its distance fromthe relative cluster center. Identiﬁcation of the Binding Pockets

The 4 cluster centers of the late intermediate and 4 additional conformations (each one extracted as acluster variant) were positioned in a cubic box with minimum wall-distance of 10 Å, that was ﬁlled withwater molecules. The system was neutralized with 28 Na + ions and brought to a ﬁnal 150 mM NaClconcentration. Energy minimization was subsequently performed, using the steepest descent algorithm.Then, each system was equilibrated for 500 ps in the NVT ensemble at 310 K (using the V-rescalethermostat), followed by 500 ps of NPT equilibration at 310 K and 1 Bar (using the V-rescale thermostatand the Parrinello-Rahman barostat). During equilibration, position restraints with force constant 1000kJ · mol − nm − where introduced on heavy atoms. Subsequently, 50 ns of MD were performed for eachstructure by retaining the positional restraints on the C α of the protein backbone. For each trajectory,1000 frames, equally spaced in time, were extracted. Each frame was analyzed by means of SiteMap [36]and DogSiteScorer [23] software in order to identify druggable pockets. We considered as druggable apocket falling within the following thresholds: volume ≥

300 Å ; depth ≥

10 Å; balance ≥ ≤ ≥ ≥ ≥ ≥ ≥ Preparation of the Virtual Library

The FDA-approved drug library included compounds from the following chemical collections: Selleck(accession date 21/03/2020, 2684 compounds), Prestwick (accession date 21/03/2020, 1520 compounds)and eDrug-3D (accession date 21/03/2020, 1930 compounds). Compounds from the DrugBank collec-tion (DrugBank Release Version 5.1.5; 1784 molecules), which includes approved, experimental andinvestigational drugs, were added. The four chemical libraries were merged using an in-house developedKNIME workﬂow to obtain a unique library of non-redundant entries (total of 9187 compounds). Theﬁnal collection was prepared with the LigPrep tool [37]. Ionization/tautomeric states were generatedat pH range 7-8 using Epik [38]. Furthermore, at most 32 stereoisomers per ligand and three lowestenergy conformations per ligand ring were produced. Where not deﬁned, all the chiral form of eachstereocenter was produced. In total, 12771 docking clients were generated.

Virtual Screening Workﬂow

For the Glide docking [22], the N- and C- terminal residues of the ACE2 intermediate were cappedwith acetyl (ACE) and N-methylamine (NMA) groups, respectively, using the Schrödinger ProteinPreparation Wizard. The capped protein structure was used to generate the receptor grid, with noscaling of Van der Waals radii for non-polar receptor atoms. The docking space was centered on thecentroid of the residues deﬁning the pockets (x: 75.4, y: 58.2, z: 67.6 for pocket 1; x: 50.8, y: 57.0, z:69.1 for pocket 2). The docking space was deﬁned as a 35 Å cubic box, while the diameter midpoint ofdocked ligands was required to remain within a smaller, nested 15 Å cubic box. Docking experimentswere performed in the Glide standard precision mode using a 0.8 factor to scale the Van der Waals radii ofthe ligand atoms with partial atomic charge less than 0.15. For the BioSolveIT docking, LeadIT (version3.2.0) was used for protein preparation and docking parameters deﬁnition. The binding site was deﬁnedon the basis of the residues composing the identiﬁed druggable pocket. The residue protonation states,as well as the tautomeric forms, were automatically assessed in LeadIT using the ProToss method,that generates the most probable hydrogen positions on the basis of an optimal hydrogen bondingnetwork using an empirical scoring function. The virtual screening workﬂow was developed by using theKNIME analytic platform and the BioSolveIT KNIME nodes. Speciﬁcally, the workﬂow was organizedas follows: (i) the Compute LeadIT Docking node was selected to perform the docking simulations ofthe ∼ · docking clients by using the FlexX algorithm [39]. Ten poses for each ligand were produced;(ii) the Assess Aﬃnity with HYDE in SeeSAR node generated reﬁned binding free energy and HYDE[40] predicted activity (HYDE aﬀ ) for each ligand pose using the HYDE rescoring function; and (iii)for each ligand, the pose with the lowest HYDE aﬀ was extracted. For AutoDock docking, ligands andreceptor structures were converted to the AD4 format using AutoDockTools and the Gasteiger-Marsiliempirical atomic partial charges were assigned. The dimensions of the grid were 60 X 60 X 60 points,with grid points separated by a 0.375 Å. The grid was centered on the centroids of pockets 1 and2. The Lamarckian genetic algorithm was used, and for each compound, the docking simulation wascomposed of 100 runs. Clustering of docked conformations was performed on the basis of their RMSD(tolerance = 2.0 Å) and the results were ranked based on the estimated free energy of binding. Theobtained poses were ﬁltered based on the Glide docking score (Glide ds ), the HYDE aﬀ and AutoDockligand binding energy (AD LBE ) as well as on the number of clusters (AD

NiC ). The consensus virtual11creening workﬂow was applied for both pocket 1 and pocket 2 (Figure 7A). The best scored structuresfor the three software were visually inspected and selected based on their binding mode and predictedinteractions. In addition, the top scoring compounds for Glide docking and SeeSAR rescoring [41] werealso evaluated. In particular, the Glide set was generated following these criteria: Glide ds ≤ -9 kcal/mol,giving rise to 97 molecules for pocket 1 and 30 for pocket 2. The SeeSAR set was instead deﬁned byapplying the following ﬁlters: (i) HYDE aﬀ ≤ µ M; (ii) Ligand eﬃciency (LE) and lipophilic ligandeﬃciency (LLE) category from 1 to 4 (where 1 corresponds to favorable LE/LLE, 5 corresponds tounfavorable LE/LLE); (iii) Good ligand conformer quality, judged on the basis of torsional quality ofthe rescored pose rotamers and intramolecular clashes; (iv) no protein-ligand intermolecular clashes,giving rise to 89 molecules for pocket 1 and 205 molecules for pocket 2.

Supplementary Materials

Atomic resolution structures of the folding intermediates together with the corresponding druggablepockets are freely available upon request.

Acknowledgements

The computational reconstruction of ACE2 folding pathway was performed thanks to the strong supportof the high performance computational infrastructures made available by Italian National Instituteof Nuclear Physics (INFN). This work was supported by a grant from Fondazione Telethon (Italy,TCP14009). GS is a recipient of a fellowship from Fondazione Telethon. EB is an Assistant TelethonScientist at the Dulbecco Telethon Institute.

Author Contribution

Conceived and designed the analyses: all authorsPerformed the analyses: AB, LT, GS, AAAnalyzed the data: AB, LT, GS, AA, MLB and PFWrote the paper: GS, PF, AA, MLB and EBAll authors edited the manuscript

Competing Interests eferences upplementary Figures

Supp. Figure 1 : Representation of the selected regions deﬁned by the energy wells.

The selected regionsdeﬁning the two energy wells are depicted in orange. In particular, the early-intermediate region is deﬁned with R between0.59 and 0.68, while the late-intermediate region is deﬁned with R between 0.797 and 0.865.

Supp. Figure 2 : Selected structure variants for each cluster.

Representative conformations of the additionalstructures selected for each cluster. upplementary Tables Supp. Table 1 : Druggability descriptors of the identiﬁed pockets.

The table shows the values of the druggabilitydescriptors computed using SiteMap and DogSiteScorer for the two pockets. The threshold describes the value requiredby a descriptor for the deﬁnition of a good druggable site. upp. Table 2 : Predicted ligands for pockets 1 and 2 of the ACE2 folding intermediate.

The table showsthe diﬀerent hits emerging from the virtual screening as well as the meﬂoquine analogues analyzed subsequently. ColumnSet indicates whether the molecule was identiﬁed by consensus (C), Glide (G) or SeeSAR (S) schemes; while (Q) indicatesthe quinoline derivatives. Aﬃnity predictors calculated with the diﬀerent software are also reported: HYDE aﬀ (LeadIT),Glide ds (Glide), AD LBE and AD

NiC (AutoDock).(AutoDock).