[PDF] Accelerating the identification of informative reduced representations of proteins with deep learning for graphs

Abstract

The limits of molecular dynamics (MD) simulations of macromolecules are steadily pushed forward by the relentless developments of computer architectures and algorithms. This explosion in the number and extent (in size and time) of MD trajectories induces the need of automated and transferable methods to rationalise the raw data and make quantitative sense out of them. Recently, an algorithmic approach was developed by some of us to identify the subset of a protein's atoms, or mapping, that enables the most informative description of it. This method relies on the computation, for a given reduced representation, of the associated mapping entropy, that is, a measure of the information loss due to the simplification. Albeit relatively straightforward, this calculation can be time consuming. Here, we describe the implementation of a deep learning approach aimed at accelerating the calculation of the mapping entropy. The method relies on deep graph networks, which provide extreme flexibility in the input format. We show that deep graph networks are accurate and remarkably efficient, with a speedup factor as large as 10 5 with respect to the algorithmic computation of the mapping entropy. Applications of this method, which entails a great potential in the study of biomolecules when used to reconstruct its mapping entropy landscape, reach much farther than this, being the scheme easily transferable to the computation of arbitrary functions of a molecule's structure.

Full PDF

aa r X i v : . [ phy s i c s . c o m p - ph ] J u l Accelerating the identiﬁcation of informative reduced representationsof proteins with deep learning for graphs

Federico Errica, Marco Giulini,

2, 3

Davide Bacciu, Roberto Menichetti,

2, 3

Alessio Micheli, and Raﬀaello Potestio

2, 3 Department of Computer Science, University of Pisa, Italy Physics Department, University of Trento, via Sommarive, 14 I-38123 Trento, Italy INFN-TIFPA, Trento Institute for Fundamental Physics and Applications, I-38123 Trento, Italy (Dated: July 20, 2020)The limits of molecular dynamics (MD) simulations of macromolecules are steadily pushed for-ward by the relentless developments of computer architectures and algorithms. This explosion inthe number and extent (in size and time) of MD trajectories induces the need of automated andtransferable methods to rationalise the raw data and make quantitative sense out of them. Recently,an algorithmic approach was developed by some of us to identify the subset of a protein’s atoms,or mapping, that enables the most informative description of it. This method relies on the com-putation, for a given reduced representation, of the associated mapping entropy, that is, a measureof the information loss due to the simpliﬁcation. Albeit relatively straightforward, this calculationcan be time consuming. Here, we describe the implementation of a deep learning approach aimedat accelerating the calculation of the mapping entropy. The method relies on deep graph networks,which provide extreme ﬂexibility in the input format. We show that deep graph networks are accu-rate and remarkably eﬃcient, with a speedup factor as large as 10 with respect to the algorithmiccomputation of the mapping entropy. Applications of this method, which entails a great potentialin the study of biomolecules when used to reconstruct its mapping entropy landscape, reach muchfarther than this, being the scheme easily transferable to the computation of arbitrary functions ofa molecule’s structure. I. INTRODUCTION

Molecular dynamics (MD) simulations [1, 2] are anessential and extremely powerful tool in the computer-aided investigation of matter. The usage of classical,all-atom simulations has boosted our understanding ofa boundless variety of diﬀerent physical systems, rangingfrom materials (metals, alloys, ﬂuids, etc.) to biologi-cal macromolecules such as proteins. As of today, thelatest software and hardware developments have pushedthe size of systems that MD simulations can address tothe millions of atoms [3], and the time scale that a singlerun can cover can approach the millisecond for relativelysmall molecules [4].In general, a traditional molecular dynamics-basedstudy proceeds in four steps, here schematically sum-marised in Figure 1. First, the system of interest hasto be identiﬁed; this apparently obvious problem can ac-tually require a substantial eﬀort per se , e.g. in the caseof dataset-wide investigations. Second, the simulationhas to be set up, which is another rather nontrivial step[5]. Then it has to be run, typically on a high perfor-mance computing cluster. Finally, the simulation outputhas to be analysed and rationalised in order to extractinformation from the data .This last step is particularly delicate, and it is acquir-ing an ever growing prominence as large and long sim-ulations can be more and more eﬀortlessly performed.The necessity thus emerges to devise a parameter-free,automated “ﬁlter” procedure to describe the system ofinterest in simpler, intelligible terms and make sense outof the immense amount of data we can produce - but notnecessarily understand. In physics, the most prominent example of a system-atic, algorithmic modelling procedure is the renormali-sation group [6, 7], a technique, developed for the studyof critical phenomena, that relies on the iterative sim-pliﬁcation of a detailed model, or coarse-graining (CG).As the simpliﬁcation progresses, the relevant propertiesof the system emerge while the irrelevant ones fade out.Coarse-graining methods have found vast application alsooutside of the realm of critical phenomena, speciﬁcally inthe ﬁeld of soft and biological matter [8–11]. Here, theyare employed to construct simpliﬁed representations of agiven macromolecular system that have fewer degrees offreedom with respect to a reference, more detailed model,while retaining key features and properties of interest.In biophysical applications this amounts to describe abiomolecule, such as a protein, using a number of con-stituent units, called CG sites, lower than the number ofatoms composing the original, all-atom system.The coarse-graining process in soft matter requires twomain ingredients, namely the representation and the CGpotential. The former consists in the deﬁnition of a map-ping [12–14], that is, the transformation M ( r ) = R thatconnects a high-resolution description r of the system’sconﬁguration to a low-resolution one R . The latter ingre-dient is the set of interactions the user introduces amongthe sites to reproduce the properties of the all-atom sys-tem.During the last decades, substantial eﬀort has been in-vested in the correct parameterisation of the CG poten-tial [12, 15]; on the other hand, the deﬁnition of the map-ping has received much less attention, being selected byintuition in the vast majority of cases. A notable exampleof this natural choice for proteins is the C α mapping, in FIG. 1. Schematic representation of the typical workﬂow of amolecular dynamics study. On the right we report the typicaltime scales required for each step of the process. which only the α carbon atoms of the polypeptide chainare retained. Although this choice can make the tuningof CG interactions easier, it is not always appropriateto account for the speciﬁc physico-chemical features of aprotein.Most methods developed in the ﬁeld of soft matter donot make use of a system-speciﬁc, algorithmic procedurefor the selection of the eﬀective sites, but rather rely ongeneral criteria, based on physical and chemical proper-ties, to group together atoms in coarse-grained “beads”irrespectively of their local environment and global ther-modynamics. While acceptable in most practical appli-cations, this approach entails substantial limitations: infact, the coarse-graining process implies a loss of infor-mation and, through the application of universal map-ping strategies, system-speciﬁc properties, albeit rele-vant, might be “lost in translation” from a higher to alower resolution representation. Hence, a method wouldbe required that enables the automated identiﬁcation ofwhich subset of retained degrees of freedom of a givensystem preserves the largest amount of important detailfrom the reference, while at the same time reducing thecomplexity of the problem.The challenge of the identiﬁcation of a maximally in-formative mapping has been recently tackled by some ofus [16]; speciﬁcally, we developed an algorithmic proce-dure to ﬁnd a CG mapping that minimises the amount of information that is lost when the number of degrees offreedom with which one observes a system is decreased.The quantity that measures this loss is called mappingentropy [13–15, 17]: S map = k B Z d r p r ( r ) ln (cid:20) V n V N p r ( r ) p R ( M ( r )) (cid:21) ≥ . (1)Here, p r ( r ) is the probability of sampling a conﬁgura-tion r in the high resolution system, namely the Boltz-mann distribution p r ( r ) ∝ exp( − βu ( r )), where u ( r ) isthe atomistic potential and β = 1 /k B T is the inversetemperature. p R ( R ) is the probability of sampling theconﬁguration R = M ( r ) in the low resolution descrip-tion: p R ( R ) = 1 Z Z d r e − βu ( r ) δ ( M ( r ) − R ) , (2)where Z is the canonical partition function of the system.In Ref. [16] and in this work we restrict our analysisto a peculiar functional form for the CG mapping, called decimation , which consists in the selection of a subset ofthe system’s n atoms. These can thus be in one of twopossible states, i.e. retained or not. This choice is par-ticularly appealing as it enables a rather straightforwardbridging between molecular modelling and deep learning.In fact, a protein structure can be represented as a graphin which each atom corresponds to a vertex, and edgesconnect pairs of atoms closer than a selected threshold.At odds with other deﬁnitions of a CG site, such as thecentre-of-mass for which the exact knowledge of the realspace coordinates is required, the information about thedecimation mapping can be directly encoded in the ver-tices of the protein’s graph by using a mark, e.g., a binaryvalue as it is done in this work.In the case of a decimation of the all-atom system, S map can be written using the following notation [13]: S map = k B × D KL ( p r ( r ) || ¯ p r ( r ))= k B Z d r p r ( r ) ln (cid:20) p r ( r )¯ p r ( r ) (cid:21) , (3)that is, a Kullback-Leibler divergence D KL [18] between p r ( r ) and the distribution ¯ p r ( r ) obtained by observingthe former through the “coarse-graining grid”, i.e., interms of the selected CG mapping. ¯ p r ( r ) is deﬁned as[13] ¯ p r ( r ) = p R ( M ( r )) / Ω ( M ( r )) , (4)where Ω ( R ) = Z d r δ ( M ( r ) − R ) , (5)is the number of microstates r that map onto the CGconﬁguration R .Diﬀerent choices of CG mapping lead to diﬀerent ¯ p r ( r )and, consequently, to diﬀerent values of S map . Unfortu-nately, the deﬁnitions in Eq. 1 or 3 do not allow, givena CG representation, to directly calculate the associatedmapping entropy.In Ref. [16] Giulini et al. tackled this problem by es-tablishing a connection between Eq. 3 and the energeticsof the system: S map ≃ k B Z d R p R ( R ) β h ( U − h U i β | R ) i β | R , (6)where h U i β | R is the average microscopic energy restrictedto macrostate R , h U i β | R = Z dU P β ( U | R ) U, (7)and P β ( U | R ) = p R ( U, R ) p R ( R )= 1 p R ( R ) Z d r p r ( r ) δ ( M ( r ) − R ) δ ( u ( r ) − U ) (8)is the conditional probability for the system to have en-ergy U provided that it is in macrostate R .Eq. 6 highlights that the mapping entropy can be cal-culated as a weighted average over all CG macrostates R of the variances of the atomistic potential energies ofall conﬁgurations r that map onto a speciﬁc macrostate.Importantly, this allows one to measure S map given a setof all-atom conﬁgurations and a decimation mapping M .Speciﬁcally, in Ref. [16] this was achieved by lumpingtogether conﬁgurations based on a distance matrix, thelatter obtained by considering only the atoms retainedin the mapping. Then, the energy variances over theseclusters were computed, each one weighted with its prob-ability.The following, natural step in this analysis is then toidentify the reduced representations of a system that areable to preserve the maximum amount of informationfrom the all-atom reference, i.e., that minimise the map-ping entropy.However, for a molecule with n atoms the numberof possible decimation mappings is 2 n , an astronomicalamount even for the smallest proteins (with n ∼

50 onehas more than 10 distinct mappings). Even restrict-ing the analysis to a ﬁxed number of retained degreesof freedom (atoms) N , the number is still huge, namely n ! / ( N ! ( n − N )!). Hence, in Ref. [16] we resorted to astochastic exploration of this immense space. In partic-ular, given a CG mapping with randomly chosen atoms,we started adding/removing atoms with an acceptancecriterion based on a Metropolis rule, using S map as lossfunction. By repeating this procedure for 48 independentruns, we built a set of optimised CG mappings such thattheir value of S map was a local minimum. Noteworthy,these mappings were found to more likely retain atomsdirectly related to the biological function of the pro-teins of interest, thus linking the described information-theoretical approach to the properties of biological sys-tems. Hence, the protocol developed in Ref. [16] rep-resents not only a practical way to select the most in-formative mapping of a molecule, but also a promising DGN h v h g gv FIG. 2. High-level overview of typical deep learning method-ologies for graphs. A graph g is given as input to a DeepGraph Network, which outputs one vector, also called em-bedding or state, for each vertex v of the graph. In this work,we aggregate all those embeddings to obtain a single one thatencodes the whole graph structure. Then, this is used by astandard machine learning regression algorithm to output the S map value associated with g . paradigm to employ CGing as a controllable ﬁltering pro-cedure that can highlight relevant regions in a biologicalstructure.The downside of this approach is its non-negligiblecomputational cost, which is due to two factors:1. Eq. 6 requires a set of conﬁgurations of the high-resolution system that are sampled through an MDsimulation, which can be expensive from a compu-tational perspective;2. the stochastic exploration of the set of possible CGmappings is limited and time-consuming.Table I shows a few values of the computational timerequired to perform these steps. It is worth stressingthat the proteins studied here are small; therefore, thesevalues would dramatically increase in the case of biggerbiomolecules.The ultimate aim of this work is the development of amachine learning approach to predict the value of S map for a CG mapping associated with a protein. The ﬁrstnontrivial requirement one can ask the model is to toreproduce the values of S map from a single biologicalstructure for which we already have the ground truth.The model would not eliminate the need for steps 1 and2, but it would make an extensive exploration of a pro-tein’s mapping space feasible. To remove these steps al-together, the model should also be able to transfer itsknowledge to unknown proteins for which we do not havean all-atom simulation or data on CG mappings.We donot address this latter issue in the present work, butrather we focus on the development and assessment ofa machine learning model for protein-speciﬁc S map pre-diction.With their long and successful story both in the ﬁeldof coarse-graining [20, 21] and in the prediction of pro-tein properties [22–24], graph-based algorithms representthe most natural choice to tackle such challenges. Here,we do not use graphs to develop a novel coarse-grainingstrategy, but rather we show that a graph-based machinelearning algorithm reproduces the results of a coarse-graining procedure obtained by means of a lengthy andnon-trivial optimisation process. Importantly, our ap-proach makes use of a negligible fraction of the huge Protein

MD CPU time MD walltime Single measure Opt. time tamapin (PDB code 6d93) 40 . .

55 days ≃ . ≃ . . .

20 days ≃ . ≃ . MD CPUtime ( MD walltime ) represents the core time (user time) necessary to run 200 ns of molecular simulation on GROMACS 2018package [19].

Single measure is the amount of time that is required to compute S map of a mapping. Opt. time quantiﬁes thecomputational burden of a single stochastic optimization (2 × steps): this value is not Single measure × × , since in[16] we resort to substantial approximations to speed-up the overall process. amount of information employed in the original method,i.e., a single protein structure viewed as a graph.To predict S map values given a protein and a CG map-ping, we leverage Deep Graph Networks (DGNs) [25–27],a family of neural networks that naturally deal with re-lational data. The main advantages of DGNs are theeﬃciency and the ability of learning from graphs of dif-ferent size and shape. This is possible for two reasons:ﬁrst, DGNs focus on local processing of vertex neighbors,so computation can be easily parallelized on every graph;secondly, in a way that is similar to Convolutional Neu-ral Networks [28] for images, DGNs stack multiple layersof graph convolutions to propagate information. At eachlayer ℓ , DGNs compute the state of each vertex v as avector h ℓv of ﬁxed dimension. Eventually, a DGN pro-duces a ﬁnal embedding for each vertex that we call h v as well as a global graph embedding h g . The latter isobtained by simple aggregation of all vertex embeddingsas sketched in Figure 2. Being in vectorial form, h g canbe fed into standard machine learning algorithms to solvegraph regression/classiﬁcation tasks. II. MATERIALS AND METHODSA. Proteins : Tamapin and Adenylate Kinase

Here we give a brief description of the two proteinsemployed in the current work. is a 31 residue-long mutant of

Tamapin [29], atoxin of the indian red scorpion. Its outstanding selec-tivity towards the calcium-activated potassium channelsSK2 made it an extremely interesting protein in the ﬁeldof pharmacology; , the open conformation of

Adenylate Kinase [30]. This 214-residue enzyme is responsible for theinter-conversion between Adenosine triphosphate (ATP)and Adenosine diphosphate + Adenosine monophosphate(ADP + AMP) inside the cell.For a concise but more detailed description of thesetwo proteins, please refer to sections II.B and II.D ofRef. [16].

FIG. 3. The two protein structures employed in this work,the tamapin mutant (PDB code: 6d93) and the open confor-mation of Adenylate Kinase (PDB code: 4ake). The former,although small, possesses all the elements of proteins’ sec-ondary structures, while the latter is bigger in size and has amuch wider structural variability.FIG. 4. Two diﬀerent mappings M and M ′ associated withthe same (schematic) protein structure. To train our machinelearning model, we treat each protein as a graph where ver-tices are atoms, and edges are placed among atoms closer thana given threshold. The selected CG sites of the two mappingsare marked in red and encoded as a vertex feature. Our goalis to automatically learn to map both mappings to propervalues S map and S map ′ of the mapping entropy. B. Data representation

To evaluate the eﬃcacy of machine learning methodsfor graphs in predicting the S map of a mapping, we con-sider two datasets, namely 6 d and 4 ake . For eachof the proteins described in the previous section we ﬁxthe number of CG sites to be equal to the number ofamino acids.Overall, each dataset consists of a single protein struc-ture and many mappings. To be fed into a machine learn-ing model, the structure is represented as a graph whereatoms are vertices and edges are inserted whenever twovertices are closer than 1 nm. The only diﬀerence be-tween samples in each dataset is the selection of CG sites,i.e., vertices of the graph, which are marked in red inFigure 4. We treat the selection of an atom as a bi-nary input feature attached to the corresponding vertexof the graph. Diﬀerent selections correspond to diﬀerentvalues of S map . In addition, we enrich each vertex with10 features that describe the atomistic physico-chemicalproperties (see Table II); similarly, we consider the in-verse atomic distance e uv between vertices u and v as anedge feature. Dataset statistics are summarized in TableIII, whereas the distribution of the target values is shownin Figure 5.Note that diﬀerent proteins will correspond to graphswith diﬀerent topology and vertex/edge information,whereas diﬀerent conformations of the same structurewill only diﬀer by their network of edges, since the chem-ical nature of their atoms remains unaltered.We build the datasets using the data retrieved inRef. [16]. Speciﬁcally, for both proteins we include 500randomly selected CG mappings, 48 optimised solutionsand 576 intermediate mappings obtained by periodicallysaving the mappings visited along the ﬁrst stages of theoptimisation procedure. Feature name Value DescriptionC 0/1 Carbon atomN 0/1 Nitrogen atomO 0/1 Oxygen atomS 0/1 Sulphur atomHPhob 0/1 Part of a hydrophobic residueAmph 0/1 Part of a amphipathic residuePol 0/1 Part of a polar residueCh 0/1 Part of a charged residueBkb 0/1 Part of the protein backboneSite 0/1 Atom selected as a CG siteTABLE II. Features used to describe atomistic physico-chemical properties. In this simple model, we only providethe DGN with the chemical nature of the atom and of itsresidue, together with the ﬂag

Bkb that speciﬁes if the atomis part of the main polypeptide chain.Protein Graphs vertices Edges Avg Degree6 d ake C. Machine Learning Model

As mentioned in Section I, the great potential of DGNslies in the ability of eﬃciently handling graphs of arbi-trary structure and size. This is especially importantwhen we want to approximate complex processes, suchas the S map computation on diﬀerent proteins, in a frac-tion of the time originally required. The main buildingblock of a DGN is the graph convolution mechanism; inthis work, we employ a restricted version of the Gated-GIN model [31] that allows us to consider edge valueswhile keeping the computational burden low. Moreover,we take into account the importance of selected sites byimplementing a weighted sum of neighbors: h ℓ +1 v = M LP ℓ +1 (cid:16)(cid:0) ǫ ℓ +1 (cid:1) ∗ h ℓv + X u ∈N v h ℓu ∗ e uv (cid:17) (9)ˆ S map = w Tout (cid:16)

ReLU (cid:16) X u ∈V sg W T [ h u , . . . , h Lu ] ∗ w s + X u ∈V g \V sg W T [ h u , . . . , h Lu ] ∗ w n (cid:17)(cid:17) (10)where N v represents the neighborhood of v , ∗ is scalarmultiplication, e uv is a scalar edge feature holding theinverse atomic distance between two atoms u and v , ǫ , W , w out , w n and w s are adaptive weights related to un-selected and selected CG sites, respectively, M LP is amulti-layer perceptron of two linear layers interleaved bya Rectiﬁer linear unit (ReLU) activation function [32],and square brackets denote concatenation of the diﬀer-ent vertex embeddings computed at diﬀerent layers. De-pending on whether a vertex has been selected as a site,i.e., u ∈ V sg , or not, i.e., u ∈ V g \ V sg , diﬀerent weights arelearned. This way, the model is able to perform a “site-aware” aggregation of vertex embeddings before predict-ing the mapping entropy of the whole graph. III. RESULTS AND DISCUSSION

To assess the performance of the model on a singleprotein, we ﬁrst split the dataset into training, validationand test realizations following an 80%/10%/10% hold-out strategy. We trained and assessed the model on eachdataset separately. We applied early stopping [33] toselect the training epoch with the best validation score,and the chosen model was evaluated on the unseen testset. The evaluation metric for our regression problem isthe coeﬃcient of determination (or R -score).Since training can be costly depending on the protein,in these preliminary experiments we tried a single con-ﬁguration with 5 graph convolutional layers where eachlayer uses 64 hidden units (i.e., the length of h ℓv ) insidethe M LP . W is the weight matrix of a linear layer with16 output units and w out is another weight vector of alinear layer that maps the input to a single scalar value.

10 12 14 16 18 20 22 S map e s t i m a t e d P D F target distribution

95 100 105 110 S map e s t i m a t e d P D F target distribution FIG. 5. Distributions of target values for both datasets. This is not to be considered the true S map probability density function,since only a fraction of these values (the Gaussian-like curves on the right side of the plots) is associated to a random explorationof the space, while the others are sampled during the optimisation procedure (left region, low S map ). All values of S map are in kJ/ mol /K . The loss objective is the Least Absolute Error. The op-timization algorithm is Adam [34] with a learning rateof 0 .

001 and no regularization. We trained for a max-imum of 10000 epochs with early stopping patience of1000 epochs and mini-batch size 8. We accelerated train-ing using a Tesla V100 GPU with 16GB of memory.Table IV reports the R -score for the 6 d and4 ake datasets in both training and test. As we ob-serve, the model can ﬁt the training set, and it has verygood performances on the test set. In Fig. 6, we showhow predicted values for training (blue) and test (orange)samples diﬀer from the ground truth. Ideally, a perfectprediction corresponds to the point being on the diago-nal dotted line. For completeness, Fig. 7 compares thedistribution of the predicted values (blue) with that ofthe ground truth (orange), and we observe that the dis-tribution of the outputs produced by the model closelyfollows the target one.Table V shows a time comparison between the algo-rithm that computes the S map value and the inference ofthe model. Generally, the cost of computing the mappingentropy is drastically reduced by 2 − IV. CONCLUSIONS AND PERSPECTIVES

Molecular dynamics simulations constitute the core ofthe majority of research studies in the ﬁeld of compu-tational biophysics. From protein folding to free en-ergy calculations, an all-atom trajectory of a biomoleculegives access to a vast amount of data about the system,from which relevant information about the system’s prop- erties, behaviour, and biological function is extractedthrough a posteriori analysis. This information retrievalcan be almost immediate and intuitive to observe (evenby naked eye) and quantify in terms of few simple param-eters (e.g. the process of ligand binding can be seen in agraphical rendering of the trajectory, and made quantita-tive in terms of the distance between ligand and protein);much more frequently, though, it is a lengthy and non-trivial task.The mapping entropy method, introduced in Eq. 1and here brieﬂy summarised, aims at performing in anautomated, unsupervised manner this identiﬁcation ofrelevant features of a system–namely, the subset of amolecule’s atoms that retain the largest amount possi-ble of information about its behaviour. This scheme re-lies on the computation of S map , that is, a measure ofthe dissimilarity between the original system conﬁgura-tions’ probability density and the one marginalised overthe discarded atoms. This quantity is employed as a costfunction and minimised over the possible reduced repre-sentations, or mappings.The extent of the mapping space, however, makesa random search practically useless and an exhaustivecounting simply impossible; hence, an optimisation pro-cedure is required to identify the simpliﬁed descriptionof the molecule that entails the largest amount of infor-mation about the system. Unfortunately, this procedurenonetheless implies the calculation of S map over a verylarge number of tentative mappings, making the opti-misation, albeit possible, computationally intensive andtime consuming.In this work, we have tackled the problem of speed-ing up the S map calculation procedure by means of deeplearning algorithms. In particular, we have shown thatDeep Graph Networks are capable of inferring the valueof the mapping entropy when provided with a schematic, Protein TR MAE TR Score VL MAE VL Score TE MAE TE Score6 d ake

10 12 14 16 18 20 target S map p r e d i c t e d S m a p Targets vs Predictions trainingtest . . . . . . . target S map p r e d i c t e d S m a p Targets vs Predictions trainingtest

FIG. 6. Plot of target values against predictions of all samples. A perfect prediction is represented by the dotted diagonal line.Training samples are in blue, while test samples are in orange. All values of S map are in kJ/ mol /K . graph-based representation of the protein and a tenta-tive mapping. The method’s accuracy is tested on twoproteins of very diﬀerent size, tamapin (31 residues) andadenylate kinase (214 residues), with a test score of 0 . .

84, respectively. These rather promising resultshave been obtained in a computing time that is up toﬁve orders of magnitude shorter than the standard algo-rithm.The presented strategy can hold the key for an exhaus-tive exploration of the CG mappings of a biomolecule. Infact, even if the training process has to be carried out foreach protein anew, the trained DGN algorithm can beemployed to extend the characterisation of the mappingentropy landscape of the system with impressive accu-racy in a fraction of the time.The natural next step would be to apply the knowledgeacquired by the model on diﬀerent protein structures, sothat the network can predict values of S map even in theabsence of an MD simulation. As of now, however, it isdiﬃcult to assess if the information extracted from thetraining over a given protein’s trajectory can be fruitfullyemployed to determine the mapping entropy of another,just feeding the structure of the latter as input. In otherwords, obtaining a transfer eﬀect by the learning model may not be straightforward, and additional informationwill be needed to achieve it.In conclusion, we point out that the proposed approachis completely general, in that the speciﬁc nature andproperties of the mapping entropy played no special rolein the construction of the deep learning scheme; further-more, the DGN formalism enables one to input graphs ofvariable size and shape, relaxing the limitations presentin other kinds of DL architectures [35]. This method canthus be transferred to other problems where diﬀerent se-lections of a subset of the molecule’s atoms give rise todiﬀerent values of a given observable (see e.g., [36]), andpave the way for a drastic speedup in computer-aidedcomputational studies in the ﬁelds of molecular biology,soft matter, and material science. ACKNOWLEDGMENTS

This project has received funding from the EuropeanResearch Council (ERC) under the European Union’sHorizon 2020 research and innovation programme (grantagreement No 758588). [1] B. J. Alder and T. E. Wainwright, “Studies in moleculardynamics. i. general method,”

The Journal of ChemicalPhysics , vol. 31, no. 2, pp. 459–466, 1959. [2] M. Karplus, “Molecular dynamics simulations ofbiomolecules,”

Accounts of Chemical Research , vol. 35,no. 6, pp. 321–323, 2002. S map e s t i m a t e d P D F output test distribution outputstargets

90 95 100 105 110 S map e s t i m a t e d P D F output test distribution outputstargets FIG. 7. Output distribution on test samples for both datasets. All values of S map are in kJ/ mol /K .Protein Single measure Inference GPU (CPU) Time Ratio GPU (CPU)6 d ≃ . ≃ . .

7) ms ≃ × (1276 × )4 ake ≃ . ≃ . .

2) ms ≃ × (435 × )TABLE V. Comparison between the time required to compute S map of a single mapping and the inference time of the model(both on CPU and GPU). We obtain a drastic improvement that allows a wider exploration of the S map space.[3] A. Singharoy, C. Maﬀeo, K. H. Delgado-Magnero, D. J.Swainsbury, M. Sener, U. Kleinekath¨ofer, J. W. Vant,J. Nguyen, A. Hitchcock, B. Isralewitz, et al. , “Atoms tophenotypes: Molecular design principles of cellular en-ergy metabolism,” Cell , vol. 179, no. 5, pp. 1098–1111,2019.[4] D. E. Shaw, R. O. Dror, J. K. Salmon, J. Grossman,K. M. Mackenzie, J. A. Bank, C. Young, M. M. Deneroﬀ,B. Batson, K. J. Bowers, et al. , “Millisecond-scale molec-ular dynamics simulations on anton,” in

Proceedings ofthe conference on high performance computing network-ing, storage and analysis , pp. 1–11, 2009.[5] C. Kandt, W. L. Ash, and D. P. Tieleman, “Setting upand running molecular dynamics simulations of mem-brane proteins,”

Methods , vol. 41, no. 4, pp. 475–488,2007.[6] S.-K. Ma,

Modern theory of critical phenomena . Rout-ledge, 2018.[7] M. E. Fisher, “Renormalization group theory: Its basisand formulation in statistical physics,”

Reviews of Mod-ern Physics , vol. 70, no. 2, p. 653, 1998.[8] S. J. Marrink, H. J. Risselada, S. Yeﬁmov, D. P. Tiele-man, and A. H. De Vries, “The martini force ﬁeld: coarsegrained model for biomolecular simulations,”

The journalof physical chemistry B , vol. 111, no. 27, pp. 7812–7824,2007.[9] S. Takada, “Coarse-grained molecular simulations oflarge biomolecules,”

Curr. Opin. Struct. Biol. , vol. 22,no. 2, pp. 130–137, 2012.[10] R. Potestio, C. Peter, and K. Kremer, “Computer simula-tions of soft matter: Linking the scales,”

Entropy , vol. 16,no. 8, pp. 4199–4245, 2014.[11] M. G. Saunders and G. A. Voth, “Coarse-graining meth-ods for computational biology,”

Annu. Rev. Biophys. ,vol. 42, pp. 73–93, 2013. [12] W. G. Noid, “Systematic methods for structurally con-sistent coarse-grained models,” in

Biomolecular Simula-tions , pp. 487–531, Springer, 2013.[13] J. F. Rudzinski and W. G. Noid, “Coarse-graining en-tropy, forces, and structures,”

The Journal of ChemicalPhysics , vol. 135, no. 21, p. 214101, 2011.[14] T. T. Foley, M. S. Shell, and W. G. Noid, “The im-pact of resolution upon entropy and information incoarse-grained models,”

The Journal of chemical physics ,vol. 143, no. 24, p. 243104, 2015.[15] M. S. Shell, “The relative entropy is fundamental to mul-tiscale and inverse thermodynamic problems,”

J. Chem.Phys. , vol. 129, no. 14, p. 144108, 2008.[16] M. Giulini, R. Menichetti, M. S. Shell, and R. Potes-tio, “An information theory-based approach for opti-mal model reduction of biomolecules,” arXiv preprintarXiv:2004.03988 , 2020.[17] M. S. Shell, “Systematic coarse-graining of potentialenergy landscapes and dynamics in liquids,”

J. Chem.Phys. , vol. 137, no. 8, p. 084503, 2012.[18] S. Kullback and R. A. Leibler, “On information and suf-ﬁciency,”

The annals of mathematical statistics , vol. 22,no. 1, pp. 79–86, 1951.[19] D. Van Der Spoel, E. Lindahl, B. Hess, G. Groenhof,A. E. Mark, and H. J. Berendsen, “Gromacs: fast,ﬂexible, and free,”

Journal of computational chemistry ,vol. 26, no. 16, pp. 1701–1718, 2005.[20] D. Gfeller and P. De Los Rios, “Spectral coarse grainingof complex networks,”

Physical review letters , vol. 99,no. 3, p. 038701, 2007.[21] M. A. Webb, J.-Y. Delannoy, and J. J. de Pablo, “Graph-based approach to systematic molecular coarse-graining,”

Journal of Chemical Theory and Computation , vol. 15,no. 2, pp. 1199–1208, 2019.[22] K. M. Borgwardt, C. S. Ong, S. Sch¨onauer, S. Vish- wanathan, A. J. Smola, and H.-P. Kriegel, “Proteinfunction prediction via graph kernels,”

Bioinformatics ,vol. 21, no. suppl 1, pp. i47–i56, 2005.[23] A. Fout, J. Byrd, B. Shariat, and A. Ben-Hur, “Proteininterface prediction using graph convolutional networks,”in

Advances in neural information processing systems ,pp. 6530–6539, 2017.[24] W. Torng and R. B. Altman, “Graph convolutionalneural networks for predicting drug-target interactions,”

Journal of Chemical Information and Modeling , vol. 59,no. 10, pp. 4131–4149, 2019.[25] D. Bacciu, F. Errica, A. Micheli, and M. Podda, “A gen-tle introduction to deep learning for graphs,”

Neural Net-works , vol. 129, pp. 203–221, 2020.[26] A. Micheli, “Neural network for graphs: A contextualconstructive approach,”

IEEE Transactions on NeuralNetworks , vol. 20, no. 3, pp. 498–511, 2009. Publisher:IEEE.[27] F. Scarselli, M. Gori, A. C. Tsoi, M. Hagenbuchner, andG. Monfardini, “The graph neural network model,”

IEEETransactions on Neural Networks , vol. 20, no. 1, pp. 61–80, 2009. Publisher: IEEE.[28] Y. LeCun, Y. Bengio, and others, “Convolutional net-works for images, speech, and time series,”

The Handbookof Brain Theory and Neural Networks , vol. 3361, no. 10,p. 1995, 1995.[29] P. Pedarzani, D. D’hoedt, K. B. Doorty, J. D. F.Wadsworth, J. S. Joseph, K. Jeyaseelan, R. M. Kini, S. V.Gadre, S. M. Sapatnekar, M. Stocker, and P. N. Strong,“Tamapin, a venom peptide from the indian red scor- pion (mesobuthus tamulus) that targets small conduc-tance ca2+-activated k+ channels and afterhyperpolar-ization currents in central neurons,”

Journal of BiologicalChemistry , vol. 277, no. 48, pp. 46101–46109, 2002.[30] C. M¨uller, G. Schlauderer, J. Reinstein, and G. E. Schulz,“Adenylate kinase motions during catalysis: an energeticcounterweight balancing substrate binding,”

Structure ,vol. 4, no. 2, pp. 147–156, 1996.[31] F. Errica, D. Bacciu, and A. Micheli, “Theoretically ex-pressive and edge-aware graph learning,” in ,2020.[32] X. Glorot, A. Bordes, and Y. Bengio, “Deep sparserectiﬁer neural networks,” in

Proceedings of the four-teenth international conference on artiﬁcial intelligenceand statistics , pp. 315–323, 2011.[33] L. Prechelt, “Early stopping-but when?,” in

Neural Net-works: Tricks of the trade , pp. 55–69, Springer, 1998.[34] D. P. Kingma and J. Ba, “Adam: A method for stochas-tic optimization,” in

Proceedings of the 3rd InternationalConference on Learning Representations (ICLR) , 2015.[35] M. Giulini and R. Potestio, “A deep learning approachto the structural analysis of proteins,”

Interface focus ,vol. 9, no. 3, p. 20190003, 2019.[36] P. Diggins IV, C. Liu, M. Deserno, and R. Potestio, “Op-timal coarse-grained site selection in elastic network mod-els of biomolecules,”