[PDF] Pre-training of Graph Neural Network for Modeling Effects of Mutations on Protein-Protein Binding Affinity

Abstract

Modeling the effects of mutations on the binding affinity plays a crucial role in protein engineering and drug design. In this study, we develop a novel deep learning based framework, named GraphPPI, to predict the binding affinity changes upon mutations based on the features provided by a graph neural network (GNN). In particular, GraphPPI first employs a well-designed pre-training scheme to enforce the GNN to capture the features that are predictive of the effects of mutations on binding affinity in an unsupervised manner and then integrates these graphical features with gradient-boosting trees to perform the prediction. Experiments showed that, without any annotated signals, GraphPPI can capture meaningful patterns of the protein structures. Also, GraphPPI achieved new state-of-the-art performance in predicting the binding affinity changes upon both single- and multi-point mutations on five benchmark datasets. In-depth analyses also showed GraphPPI can accurately estimate the effects of mutations on the binding affinity between SARS-CoV-2 and its neutralizing antibodies. These results have established GraphPPI as a powerful and useful computational tool in the studies of protein design.

Full PDF

PPre-training of Graph Neural Network for ModelingEffects of Mutations on Protein-Protein Binding Afﬁnity

Xianggen Liu , Yunan Luo , Sen Song and Jian Peng Abstract

Modeling the effects of mutations on the binding afﬁnity plays a crucial role in protein en-gineering and drug design. In this study, we develop a novel deep learning based framework,named GraphPPI, to predict the binding afﬁnity changes upon mutations based on the featuresprovided by a graph neural network (GNN). In particular, GraphPPI ﬁrst employs a well-designed pre-training scheme to enforce the GNN to capture the features that are predictiveof the effects of mutations on binding afﬁnity in an unsupervised manner and then integratesthese graphical features with gradient-boosting trees to perform the prediction. Experimentsshowed that, without any annotated signals, GraphPPI can capture meaningful patterns of theprotein structures. Also, GraphPPI achieved new state-of-the-art performance in predictingthe binding afﬁnity changes upon both single- and multi-point mutations on ﬁve benchmarkdatasets. In-depth analyses also showed GraphPPI can accurately estimate the effects of mu-tations on the binding afﬁnity between SARS-CoV-2 and its neutralizing antibodies. Theseresults have established GraphPPI as a powerful and useful computational tool in the studiesof protein design.

Introduction

Protein-protein interactions (PPIs) play an essential role in various fundamental biological pro-cesses. As a representative example, the antibody (Ab) is a central component of the human im-mune system that interacts with its target antigen to elicit an immune response. This interactionis performed between the complementary determining regions (CDRs) of the Ab and a speciﬁcepitope on the antigen. The ability of Ab to bind to a wide variety of targets in a speciﬁc and selec-tive manner has been making antibody therapy wide used for a broad range of diseases includingseveral types of cancer (Ben-Kasus et al., 2007) and viral infection (Barouch et al., 2013).Despite the broad application potentials of Ab therapy, it is very challenging to design Abs thathave a desired property of PPI. The difﬁculty lies in two folds. On the one hand, the rational Abdesign is usually performed by iteratively mutating residues but the experimental measurement Department of Computer Science, University of Illinois at Urbana Champaign, IL, USA. Laboratory for Brain and Intelligence and Department of Biomedical Engineering, Tsinghua University, Beijing,China. Beijing Innovation Center for Future Chip, Tsinghua University, Beijing, China. a r X i v : . [ q - b i o . B M ] A ug f the PPI change of an Ab mutant is labor-intensive and time-consuming. On the other hand, thenumber of possible Ab mutants is considerably large, making most of the searching ineffective.Thus, quantitative and fast evaluation of PPI changes upon mutations is crucial for protein engi-neering and antibody design. In this paper, we mainly focus on the binding afﬁnity of proteins, ∆ G , which is one of the most typical properties of PPI.Traditional methods for modeling the binding afﬁnity changes upon mutation (i.e., ∆∆ G ) canbe grouped into three categories: 1) The molecular modeling mechanics, which simulated thefree-energy difference between two states of a system based on continuum solvent models (Leeand Olson, 2006). 2) Empirical energy-based methods, leveraging classical mechanics or statisti-cal potentials to calculate the free-energy changes, and 3) Machine learning methods that ﬁt theexperimental data using the sophisticated engineered features of the changes in structures. Themethods in the ﬁrst category usually provided reliable simulation results but they also requirednumerous computational resources, which limited their applications. The empirical energy-basedmethods, exempliﬁed by STATIUM (DeBartolo et al., 2014), FoldX (Schymkowitz et al., 2005) andDiscovery Studio (Inc., 2013), accelerated the prediction speed but tended to be disturbed by in-sufﬁcient conformational sampling, especially for mutations in ﬂexible regions.As for the machine learning methods, the accumulation of the experimental mutation datahas provided an unprecedented opportunity for them to directly model the intrinsic relationshipbetween constructed features of a mutation site and the corresponding binding afﬁnity change.In particular, Geng et al. (2019) proposed a limited number of predictive features from the in-terface structure, evolution and energy-based features, as the input of a random forest to predictthe afﬁnity changes upon mutations. Similarly, MutaBind2 (Zhang et al., 2020) introduced sevenfeatures that described interactions of proteins with the solvent, evolutionary conservation of thesite, and thermodynamic stability of the complexes for prediction of the afﬁnity changes upon mu-tations, which achieved state-of-the-art performance in SKEMPI 2.0 dataset (Jankauskait ˙e et al.,2019). However, as the features proposed by these machine-learning methods were manually en-gineered based on the known rules in protein structures, their predictive generalization acrossvarious protein structures is limited. In addition, these methods relied on the results of the energyoptimization algorithms, which was not always reliable due to the stochastic sampling (Hallenet al., 2018; Suárez and Jaramillo, 2009). Here, we are seeking for a method that can generalizewell and has less dependency on the performance of third-party optimization algorithms.In this paper, we propose a novel framework, called GraphPPI, to accurately model the effectsof mutations on the binding afﬁnity based on the features learned from scratch. In particular,GraphPPI ﬁrstly employs a graph neural network (GNN) to produce graphical descriptors andthen uses a machine learning algorithm (gradient-boosting trees (GBTs) (Friedman, 2001)) to per-form the prediction based on these descriptors. To make the graphical descriptors predictive tothe strength of the binding interaction between proteins (i.e., the binding afﬁnity), we introducea pre-training scheme in which the GNN learns to reconstruct the original structure of a complexgiven the disturbed one (where the side chains of a residue are randomly rotated). This recon-struction task requires the GNN to learn the intrinsic patterns underlying the binding interactionsbetween atoms and thus to facilitate the prediction of the GBT.GraphPPI has the following advantages, compared with traditional methods. 1) GraphPPIis capable to automatically learn meaningful features of the protein structure for the prediction,obviating the need of feature engineering. 2) By training on the large-scale dataset of complexes,the GNN learns the general rules of structures that exist across different complexes, leading tobetter generalizability of GraphPPI.In our experiments, we ﬁrst investigated what the GNN in GraphPPI learned during the pre-training scheme. We found that, without any annotated labels for learning, GNN can detect the2bnormal interaction between atoms and successfully distinguish whether a residue locates inthe interface of the complex or not. Second, we evaluated GraphPPI in the prediction of bind-ing afﬁnity changes upon mutations on ﬁve benchmark datasets, three for single-point mutationsand two for multi-point mutations. GraphPPI obtained the new state-of-the-art performance onthese datasets, demonstrating the effectiveness of the graphical features learned by the pre-trainedGNN. Besides, we collected several complexes where the newly ﬁltered neutralizing monoclonalantibodies (mAb) bind with the spike glycoprotein protein of SARS-CoV-2. GraphPPI is able toaccurately predict the difference in the binding afﬁnity between these complexes, even thoughthe number of mutated residues is larger than that in the training set. Based on one of thesemAbs, we also showcased the residues where certain mutations were predicted by GraphPPI tosigniﬁcantly increase the stabilizing effect of the binding with the SARS-CoV-2. These results havedemonstrated that our GraphPPI can serve as a powerful tool for the prediction of binding afﬁnitychanges upon mutations and have the potential to be applied in a wide range of tasks, such as de-signing antibodies, ﬁnding disease driver mutations and understanding underlying mechanismsof protein biosynthesis. Results

The GraphPPI framework

In this study, we propose a deep learning based framework, name GraphPPI, which mod-els the effects of mutations on the binding afﬁnity based on the features learned by a pre-trainedgraph neural network (GNN)(Figure 1). More speciﬁcally, to achieve both the powerful expressivecapacity for graph structures and the robustness of the prediction, the GraphPPI framework ac-complishes the prediction by sequentially employing two machine learning components, namelya GNN (excelling in extracting graphical features) and a gradient-boosting tree (GBT, excellingin avoiding overﬁtting). The GNN integrates the features of neighboring atoms for updatingrepresentations of the center atom, providing deep graphical features to represent the complexstructure. The GBT takes the graphical features of both the complex and the mutant as inputsand predicts the corresponding binding afﬁnity change. Before the prediction, we need to trainthese two components to ensure that the GNN can offer useful graphical features for the predic-tion of the GBT and that the GBT captures the relationship between the inputs (i.e., the graphicalfeatures) and the outputs (i.e., the binding afﬁnity changes upon mutations). However, the truefeatures that shape the binding afﬁnity of a complex still remain largely elusive. No supervisionsignal is given regarding what features that GNN should produce for the input complex.3 nitial atom features: atom type,residue type, on interface or not,coordinates, etc.

Graph Network

EncodingDisturbance

C CC ON C CC CC ON C C

Original side chain Disturbed side chainMLPOriginal coordinateReconstruction

C CC ON C C

Computing attention weights Update features C CCN C CCN

Graph Network

Encoding C CC ON C CC CC ON C C

Graph Network

Encoding

C CC ON C O C CC ON C O

Gradient Boosting Tree

Binding affinity changes

Prediction Pre-training

Complex Mutant

Initial atom features Graphical descriptors Disturbed atoms Direction of information passing

Figure 1:

The prediction pipeline and the pre-training scheme of the GraphPPI. In the prediction of theGraphPPI, a graph neural network (GNN) produce the graphical descriptors for the given wide-type pro-tein complex and the complex mutant, respectively, and a gradient boosting tree takes these descriptorsas the input to predict the corresponding afﬁnity changes. Before the prediction, a pre-training scheme en-forces the GNN to capture the intrinsic patterns underlying the interaction between atoms by reconstructingthe original structure of a complex given the disturbed one (where side chains of a residue are randomlyrotated).

To address this problem, we proposed a novel pre-training scheme to facilitate the GNN toproduce useful graphical features for the input complex. The pre-training in deep learning in-volves training a model with numerous unlabeled data to obtained deep representations of theinput samples, which has been demonstrated to be useful for the various downstream tasks in theﬁelds of natural language processing (Devlin et al., 2019) and computer vision (Pathak et al., 2016).In the pre-training scheme, the GNN aims to reconstruct the original structure of a complex giventhe disturbed one where the side chains of a residue are randomly rotated, in which the supervi-sion signals are automatically obtained. In particular, for a protein complex in the pre-training,we disturb the three dimensional coordinates of the side chains of a residue by random rotation.The GNN takes the graphical representations of the disturbed structure as input and learns to4econstruct the original coordinates.This well-designed reconstruction task in the pre-training scheme requires GNN to captureintrinsic patterns underlying the interactions between atoms and thus provide predictive graphi-cal features for the downstream task (i.e., prediction of binding afﬁnity changes upon mutations).Considering that most of the previous methods fall into the previously mentioned three cate-gories, GraphPPI is the ﬁrst attempt (to our best knowledge) to apply a pre-training scheme toextract features for the prediction of binding afﬁnity changes upon mutations.

GraphPPI captures meaningful patterns of the complex structure

To pre-train the GNN in GraphPPI, we ﬁrst constructed a large-scale dataset that containssolved structures of 2591 complexes from the PDB-BIND dataset (Su et al., 2018). The dataset wasrandomly split into a training set and a development set, with a ratio of 9:1. For each complex,we disturbed the structure by randomly selecting a residue and randomly sampling the angles ofits side chains based on the observed distribution (Shapovalov and Dunbrack Jr, 2011) (Methods).We repeated the disturbance 10,000 times for each complex, resulting in totally 25,910,000 datapoints in the dataset. During the training of the GNN, the parameters of GNN that yielded thebest performance on the development set were chosen. The best hyperparameters of GraphPPIwere calibrated through a grid search procedure (Supplementary Table 1). The resulting optimalhyperparameter settings on the development set are listed in Supplementary Table 1. As the bind-ing afﬁnity of the two proteins in the complex is largely determined by the interactions of theatoms in the interface of the two proteins, below, we tried to test 1) whether the pre-trained GNNin GraphPPI can identify the interface region of a complex or not; and 2) whether the pre-trainedGNN is sensitive to the abnormal bond length strength of atoms.To answer the ﬁrst question, we fed all the complexes in the dataset into the pre-trained GNNseparately and obtained the graphical descriptors of each residue in the interface and those of thenon-interface residues. We only keep the same number of the graphical descriptors of the non-interface residues as the interface ones since the non-interface residues are usually more than theinterface ones. As α -carbon is the central point in the backbone of the amino acid, its graphical de-scriptors were used to represent the corresponding residue. Then, we employed the t-distributedstochastic neighbor embedding (t-SNE) to visualize the distribution of these graphical descriptorsin a low-dimensional space. t-SNE is widely used in machine learning to reduce the feature di-mension and preserve the most important two components (Maaten and Hinton, 2008). We alsoperformed the same analysis on the GNN model that was not pre-trained for comparison.Figure 2A-B shows the distributions of the graphical descriptors of the residues produced bythe GNN with or without the pre-training scheme. When we used the GNN that was not pre-trained to generate graphical descriptors, the distribution of the residues in the interface is similarto that of non-interface ones. Expectedly, with the pre-training scheme, the two distributions ofthe residues in and not in the interface present dramatic distinction in the two dimensional spacereduced by t-SNE, indicating our pre-training scheme enables the GNN capable to capture thepatterns regarding these key locations (i.e., the interfaces) in a complex. This is largely becausethe interface and non-interface residues have different solvent accessibilities (Reichmann et al.,2005); To accurately reconstruct the original coordinates of the disturbed side chains during pre-training, the GNN have to identify the interface region.Next, we try to investigate whether the pre-trained GNN can detect the abnormal binding in-teractions between atoms in a complex. Concretely, for a complex, we randomly selected an atomand disturbed its coordinate within a distance of 4Å. Then we fed the disturbed structure to theGNN and obtained the corresponding graphical descriptors. All the complexes in the pre-training5 D With pre-training Without pre-training A B With pre-training Without pre-training

Figure 2:

Visualization of the graphical descriptors of atoms by t-SNE. A) Distributions of the graphicaldescriptors of the residues in and not in the interface produced by the pre-trained GNN. B) Distributionsof the graphical descriptors of the residues in and not in the interface produced by the GNN without thepre-training scheme. C) Distributions of the graphical descriptors of the atoms with different disturbeddistances produced by the pretrained GNN. D) Distributions of the graphical descriptors of the atomswith different disturbed distances produced by the GNN without the pre-training scheme. The disturbeddistances were ranging from 0Å to 4Å.

GraphPPI advances the state of the art in estimating effects of mutations on bindingafﬁnity

We evaluate GraphPPI on three widely used datasets, namely, the AB-Bind dataset (Sirin et al.,2016), the SKEMPI dataset (Moal and Fernández-Recio, 2012) and the SKEMPI 2.0 dataset (Jankauskait ˙eet al., 2019). Each data point in these datasets comprises the structure of a wide-type complex, theresidues to be mutated, the new residue type and the corresponding binding afﬁnity changes.Based on the mutation information, we used the “buildmodel” function in FoldX (Schymkowitzet al., 2005) to build the 3D structure of the mutated complex, and transformed the wide-typecomplex and mutated one the trained GNN to obtain meaningful descriptors for the ﬁtting andprediction of the GBT in GraphPPI. The binding afﬁnity is measured by the binding free energy(Methods), serving as the training labels of the GBT in GraphPPI.As the difﬁculty varies between the prediction tasks in the settings of single-point and multi-point mutations, we assessed the prediction ability of GraphPPI in these two settings separately.The methods that obtained the best results in literature on these two settings are TopGBT (Wanget al., 2020) and MutaBind2 (Zhang et al., 2020), respectively, which were used as the main base-lines in the comparison with GraphPPI. As for the evaluation of the model performance, wemainly considered two metrics, including Pearson correlation coefﬁcient ( R p ) and root mean squareerror (RMSE). Performance on AB-Bind dataset.

The data points in the AB-Bind dataset are derived fromstudies of 32 antibody-antigen complexes, each comprising 7 to 246 variants (including bothsingle- and multi-point mutations). The AB-Bind dataset includes 1,101 mutational data pointswith experimentally measured binding afﬁnities, also denoted by S1101 set. The data come fromcomplexes with crystal structures either of the parent complex or of a homologous complex withhigh sequence identity. We also followed Wang et al. (2020) to built a sub-dataset that only con-siders single mutations in the AB-Bind dataset, called the S645 set.The prediction performance of individual methods on AB-Bind S645 is shown in Figure 3A.We observed that the machine learning methods (e.g., TopGBT and GraphPPI) obtained better re-sults than the non-machine learning one (i.e., FoldX), which is largely because the non-machinelearning method directly makes a prediction based on the optimized results but they do not learnfrom labeled data. In addition, our model also behaved signiﬁcantly better than all the other ma-chine learning models, including the previous state-of-the-art prediction model TopGBT (P<0.017 .1 Test on S645 set P e a r s o n c o rr e l a t i o n c o e ff i c i e n t Test on S1101 set P e a r s o n c o rr e l a t i o n c o e ff i c i e n t Test on S1131 set P e a r s o n c o rr e l a t i o n c o e ff i c i e n t GraphPPI TopGBT mCSM-PPI2 MutaBind2

Test on S4169 set P e a r s o n c o rr e l a t i o n c o e ff i c i e n t Test on S1707 P e a r s o n c o rr e l a t i o n c o e ff i c i e n t A BCD E Figure 3:

Comparison of prediction performance of GraphPPI with that of different baseline methods interms of Pearson correlation coefﬁcient on the ﬁve benchmark datasets. A) Prediction performance on theS645 set in the AB-BIND dataset, where single residue was mutated in each data point. B) Prediction perfor-mance on the S1101 set, containing both single-point and multi-point mutations. C) Prediction performanceon the S1131 set in the SKEMPI dataset, where single residue was mutated in each data point. D) Predictionperformance on the S4169 set in the SKEMPI 2.0 dataset, where single residues was mutated in each datapoint. E) Prediction performance on the S1707 set in the SKEMPI 2.0 dataset, where multiple residues weremutated in each complex. The bar plots shown in (A-D) represent the performance of the ten-fold crossvalidation test. To have a fair comparison with the MutaBind2 and FoldX, all the results in (E) were ob-tained by the two-fold cross validation test. All the results except for GraphPPI and TopGBT were quotedfrom Wang et al. (2020), Sirin et al. (2016) and Zhang et al. (2020). The performance of TopGBT was obtainedby running its published source code.

8y paired sample t-test). TopGBT used topology-based features as the representation of the com-plex, which were originally not designated for representing the patterns of interactions betweenatoms, limiting their predictive power for binding afﬁnity changes upon mutations. By contrast,the pre-training scheme in GraphPPI was built to explicitly learn the binding interactions of atomsand thus leading into better prediction results.Further, we use all the data in the S1101 set to evaluate GraphPPI in predicting afﬁnity changesupon both single and multi-point mutations in the ten-fold cross-validation test. Previously, anumber of previous methods have been tested on this dataset, such as Discovery Studio (Inc.,2013), STATIUM (DeBartolo et al., 2014), mCSM-PPI (Ascher et al., 2015), DFIRE (Zhou and Zhou,2002), dDFIRE(Yang and Zhou, 2008), and Rosetta (Humphris and Kortemme, 2008). However,the multi-point mutation involves more complicated molecular dynamics and its impact is not thesimple summation of the impact of each mutation, making it more difﬁcult in predicting the corre-sponding effects. Also, some of the baseline methods do not directly generalize from single-pointmutations to multi-point mutations, such as TopGBT and mCSM-AB (Pires and Ascher, 2016). De-spite these problems, we observe that GraphPPI still yields signiﬁcantly better performance thanall other the baselines, leading to an improvement of 73% over Discovery Studio (Inc., 2013) interms of R P (Figure 3B). Considering that GraphPPI adopts the multi-head attention mechanism,different types of effects induced by individual mutations can be reﬂected by different dimen-sions of the graphical descriptors. This distributed fashion of the features better represents thestructure changes of multi-point mutations, resulting in the superiority of GraphPPI in predictionperformance in this setting. Performance on SKEMPI dataset.

The SKEMPI dataset is a database of 3047 binding freeenergy changes upon mutation assembled from the scientiﬁc literature, for protein-protein het-erodimeric complexes with experimentally determined structures (Moal and Fernández-Recio,2012). Subsequently, Xiong et al. (2017) ﬁltered a subset of 1,131 non-redundant interface single-point mutations from the original SKEMPI dataset, denoted as S1131 set. A number of previ-ous methods have been tested on this dataset, such as BindProfX (Xiong et al., 2017), Proﬁle-score (Lensink and Wodak, 2013), BeAtMuSiC (Lensink and Wodak, 2013), SAMMBE (Petukhet al., 2016), Dcomplex (Liu et al., 2004), and TopGBT (Wang et al., 2020). Their prediction per-formance is shown in Figure 3C. Through this comparison, we observed that the methods thatdirectly model the molecular mechanics, such as Dcomplex and FoldX, usually yield poor results.Beneﬁted from the larger number of training data points than AB-Bind s645, the machine learningapproaches obtain high correlations between the predicted afﬁnity changes upon mutations andthe experimental ones. Also, GraphPPI achieves new state-of-the-art performance on the S1131set, signiﬁcantly better than others (P<0.05 by paired sample t-test).

Performance on SKEMPI 2.0 dataset.

As an updated version of the SKEMPI dataset, theSKEMPI 2.0 (Jankauskait ˙e et al., 2019) is composed of 7085 single- or multi-point mutations from345 complexes. Rodrigues et al. (2019) ﬁltered single-point mutations and selected 4,169 variantsfrom 319 different complexes, called S4169 set. On this dataset, GraphPPI achieved an R p of 0.78and RMSE of 1.13 kcal mol − , while TopGBT obtained 0.76 and 1.16 kcal mol − in terms of R p and RMSE, respectively (Figure 3D). As for the prediction task for multi-point mutations, Zhanget al. (2020) built a dataset called M1707, which contained 1707 data points that have more thanone mutation. As shown in Figure 3E, GraphPPI also achieves the best performance on the M1707set, achieving an R p of 0.89 and RMSE of 1.51 kcal mol − , better than the previously best methodMutaBind2 (P<0.05 by paired sample t-test). These results further demonstrate the predictionsuperiority of GraphPPI on both single- and multi-point mutations.9 raphPPI shows better prediction generalizability across proteins and structures The generalizability of a machine learning method is our major concern since it determineshow broadly a machine learning model can be applied in the prediction of the PPI between pro-teins. Therefore, in this section, we further evaluate the generalizability of GraphPPI across differ-ent proteins and structures.

Performance of leave-one-complex-out cross-validation.

In addition to cross-validation testsas we used previously, here we evaluate the models’ generalizability by leave-one-complex-outcross-validation (CV) tests on the S645 set (single-point mutation dataset) and M1707 set (multi-point mutation dataset). The leave-one-complex-out CV test involves leaving all the variants ofone complex as the test set and using the variants of all the other complexes as the training set. Bydoing this splitting, the complex in the test set is guaranteed to be not in the training set, whichallows us to further estimate the prediction performance of the method on previously unseen datapoints. In this experiment, we mainly compare GraphPPI with the previous state-of-the-art meth-ods on each benchmark dataset, i.e., TopGBT (on the single-point mutation dataset, Figure 4A)and MutaBind2 (on the single-point mutation dataset, Figure 4B). We found that the predictionperformance of all the methods largely decreased compared with that in the ten-fold CV, reveal-ing the more difﬁculty of the leave-one-complex-out CV test. Also, GraphPPI surpasses TopGBTand MutaBind2 by up to an improvement of 13% in terms of R P on the S645 and M1707 sets,respectively (Figure 4A-C). Considering that the features used in MutaBind2 were manually de-signed, these features were not guaranteed to generalize across different proteins. However, asthe features produced by GNN were learned from the large-scale dataset with thousands of typesof complexes, the better generalizability across complexes of GraphPPI than MutaBind2 can beexpected. Performance of leave-one-structure-out cross-validation.

As different proteins may sharesimilar structures, a machine learning model has the potentials to overﬁt the speciﬁc structureinstead of learning the intrinsic patterns that shape the binding afﬁnity. To further test the gen-eralizability of our model across different structures, we conducted leave-one-structure-out cross-validation (CV) tests on the S645 set and M1707 set. Similar to the leave-one-complex-out CV, theleave-one-structure-out CV test evaluates the prediction model using the variants of each complexas the testing set, but the variants of the other complexes with similar structures were removedfrom the training set. More speciﬁcally, we used TMalign (Zhang and Skolnick, 2005) softwareto measure the similarity in structures between two complexes ( Supplementary Figures 1 and 2).As the similarity score over 0.5 in TMalign between two complexes is regarded to share identicalparts of structures, we adopted the cutoff of 0.5 to construct the training set (i.e., removing thesimilar complexes from the training set) for each round of the leave-one-structure-out CV test.The prediction results of the methods in the leave-one-structure-out CV test are also shownin Figure 4. We found the performance of both GraphPPI and TopGBT has noticeable decreases,compared with the performance in the leave-one-complex-out CV test. This comparison showsthat, despite the larger gap between the training and testing data in protein structures, GraphPPIstill shows better prediction generalizability than TopGBT and MutaBind2, again indicating theeffectiveness of the pre-training scheme in GraphPPI.

Ablation study.

To investigate the effect of the design choices of GraphPPI, we conducted aseries of ablation studies. We ﬁrst compared the difference of the performance of GraphPPI withand without the pre-training scheme on both ten-fold CV and leave-one-complex-out CV tests.In the GraphPPI without the pre-training scheme, its difference from the original framework isthe parameters of the GNN were randomly initialized. As shown in Supplementary Figure 3,without a pre-training scheme, GraphPPI shows a signiﬁcant decrease in prediction performance10 BC P e a r s o n c o rr e l a t i o n c o e ff i c i e n t GraphPPITopGBT

Test on S645 set P e a r s o n c o rr e l a t i o n c o e ff i c i e n t GraphPPIMutaBind2

Test on M1707 set D Figure 4:

Performance of the prediction models in the leave-one-complex-out cross-validation test and theleave-one-structure-out cross-validation test. A) Comparison of the performance of GraphPPI with that ofTopGBT on the S645 set. B) Comparison of the performance of GraphPPI with that of MutaBind2 on theM1707 set. C) The experimental values of the afﬁnity changes and those predicted by GraphPPI in theleave-one-structure-out CV test on the S645 set. D) The experimental values of the afﬁnity changes andthose predicted by GraphPPI in the leave-one-structure-out CV test on the M1707 set.

11n the ten-fold CV test, and behaves more poorly in the leave-one-complex-out CV test. Theseresults show that the well-designed pre-training scheme largely improves the predictive power ofGraphPPI, especially in the predictive generalizability across different proteins. Next, the ablationstudy on another key component in GraphPPI, i.e., the GNN model, further conﬁrmed that ourGNN plays a critical role in learning the interactions between atoms and representing the complexstructures than other neural networks (e.g., MLP, Supplementary Figure 3).

GraphPPI accurately predicts effects of mutations on binding afﬁnities between SARS-CoV-2 and its antibodies

Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) caused an outbreak of pneu-monia as a new world pandemic, leading to more than 23 million infection cases and 800 thousanddeaths as of August 25, 2020. The spike glycoprotein protein (S) of SARS CoV 2 recognizes andattaches angiotensin-converting enzyme 2 (ACE2) when the viruses infect the human cells. Anti-bodies that can effectively block SARS-CoV-2 entry into host cells provides a promising therapyfor curing the related diseases in the near future. As our framework GraphPPI has shown thepowerful predictive capacity in various benchmark datasets, here we wonder whether GraphPPIcan capture the effects of mutations of the antibodies on the binding afﬁnity with SARS-CoV-2.To this end, we constructed a small dataset that contains several complexes of the SARS-CoV-2 S protein with different monoclonal antibodies (mAbs). We collected ﬁve potent neutralizingmAbs against SARS-CoV-2 from Cao et al. (2020) (Figure 5A). These mAbs neutralize the SARS-CoV-2 by binding with the receptor binding domain (RBD) of the S protein with different bindingstrength. Cao et al. (2020) measured their binding afﬁnities with SARS-CoV-2 using surface plas-mon resonance (SPR) and also showed that these mAbs shared high homology in the CDR3 H sequences with SARS-CoV neutralizing mAb m396 (PDB ID: 2dd8, which neutralizes SARS-CoVby disrupting ACE2/RBD binding), providing a way to approximate the structures of these ﬁvemAbs. Based on the solved three-dimensional (3D) structure of m396, we leveraged the “build-model” function in FoldX to construct the 3D structures of these mAbs and then used ZDOCKsoftware (Pierce et al., 2014) to predict the orientations of the RBD to these mAbs (Figure 5C). Fi-nally, we obtained the approximated structures of the ﬁve complexes that comprise SARS-CoV-2RBD with neutralizing mAbs. These constructed complexes are relatively reliable since they sharesimilar structures and binding sites with the complex of SARS-CoV bond by m396 (Figure 5B-C).As these collected complexes can be regarded as multi-point missense mutants to each other(Figure 5A), we evaluated the performance of GraphPPI by measuring the difference in the pre-dicted and observed afﬁnity changes between each pair of these complexes. Before testing, Graph-PPI was trained on all the data points of the large-scale M1707 dataset.Figure 5D shows the measured binding afﬁnity changes between individual pairs of the com-plexes of mAbs and Figure 5E shows the corresponding afﬁnity changes predicted by GraphPPI.GraphPPI achieved a strong correlation of 0.71 and an RMSE of 0.67 kcal mol − , suggesting thatdespite the larger number of mutated points than that in training data, GraphPPI still successfullycaptures their changes of binding afﬁnity upon mutations to a large extent.12 R D A M S Y G M D VBD-500Ab nameBD-503 CDR3 H CDR3 L BD-504BD-507 A R D L V V Y G M D VBD-508 A R D A Q N Y G M D V Q Q S Y T T P L F TA R D A A V Y G I D VA R D L I S R G M D V Q Q S Y T T P L F TQ Q S Y S T P P Y TQ Q L N S N P P I TQ Q S Y S T P P D T G (kcal/mol)-10.44-11.77-11.60-10.78-10.54

BD-500 BD-503 BD-504 BD-507 BD-508BD-500 -0.15 -0.58 0.04 -0.71

BD-503

BD-504

BD-507

BD-508

AB E m396SARS-CoV RBD BD-508SARS-CoV-2 RBD CD BD-500 BD-503 BD-504 BD-507 BD-508BD-500 -1.33 -1.16 -0.33 -0.10

BD-503

BD-504

BD-507

BD-508

Original complex Original complex

Mutant Mutant

Figure 5:

The structures of the neutralizing mAbs against SARS-CoV-2 and the comparison of the measuredand predicted afﬁnity changes between individual complexes of these mAbs. (A) The CDR3 sequence com-parison between SARS-CoV-2 neutralizing mAbs. (B) The solved structure (PDB ID: 2dd8) of the complexcomposed of m396 and the SARS-CoV-2 RBD. (C) The approximated structure of the complex composedof a mAb (exempliﬁed by BD-508) and the SARS-CoV-2 RBD. (D) The observed binding afﬁnity changesbetween two arbitrary mAbs. (E) The binding afﬁnity changes between two arbitrary mAbs predicted byGraphPPI.

GraphPPI provides biological insights in designing antibody against SARS-CoV-2

As GraphPPI can accurately predict the afﬁnity changes upon mutations of the mAbs com-plexes against SARS-CoV-2, here we try to use GraphPPI to design mAb mutants that can bindwith SARS-CoV-2 with better stability (measured by the negative change in binding free energy).As a case study, we performed a one-step design on the base of BD-508, a mAb veriﬁed in theplaque reduction neutralization test (PRNT) using authentic SARS-CoV-2 isolated from COVID-19 patients and showed high potencies (Cao et al., 2020). We performed a full computational13utation scanning on the interface of the complex consisting of BD-508 and the SARS-CoV-2 RBDto investigate which mutations tend to yield higher binding afﬁnities. The 34 residues of BD-508were mutated to all other 19 amino acid types. To reduce the computational complexity of thescanning, we focused on single-point mutations. In total, we thus conducted 646 computationalmutations. We collected the data points with single-point mutations from the S4169 set and alsoincluded their reversed mutations by setting the corresponding afﬁnity changes to the negativevalues of its original ones, obtaining a dataset with 8338 variants. We adopted this dataset to trainGraphPPI and used the trained GraphPPI to predict the binding afﬁnity changes of the above 646mutations on the interface of BD-508 separately.The afﬁnity changes of each residue with different mutations were shown in Figure 6A, wherethe amino acid types of the mutants were categorized into charged, polar, hydrophobic and special-case groups. We observe that the afﬁnity changes were highly correlated with the amino acid typesand mutating to tryptophan often yields the highest binding afﬁnity changes in this case. In ad-dition, there are three sensitive residues in BD-508 whose mutations could signiﬁcantly improvethe binding afﬁnity, i.e., H47W and H59Y (Figure 6B). Through this case study and the tests onthe mAbs against SARS-CoV-2, we demonstrate that GraphPPI can accurately predict the bind-ing afﬁnity changes upon mutations of mAbs and showcase how to use GraphPPI to improve thebinding afﬁnity of mAbs with SARS-CoV-2. We believe our GraphPPI can serve as a useful tool indesigning antibodies and other related biological tasks.14

D-508 SARS-CoV-2 RBDH59Y AB Positive Negative Polar Hydrophobic Special

H47W

Figure 6:

The afﬁnity changes of each residue with different mutations in the interface of the complex ofBD-508 with the SARS-CoV-2 RBD. (B) The visualization of the average afﬁnity changes of the mutations ineach residue in the interface of DB-508. ethods Deﬁnition of the task of predicting protein-protein binding afﬁnity changes upon mu-tation

Given 3D structure of a protein-protein complex, the residue(s) to be mutated and the newresidue type(s), the goal is to estimate the binding free energy changes (i.e., ∆∆ G ) between theoriginal complex with the mutant. ∆∆ G = ∆ G mutant − ∆ G wide-type , (1)where ∆ G is the binding free energy of a complex. Detailed implementation of individual modules in GraphPPI

Encoding the graph representations of a given complex

To build a graph for a given protein complex, we regard atoms as nodes, and their interactionsas edges. We only consider four types of atoms, namely C, O, N, S. For each node k , we use itsattributes as its raw features, such as element type, the residue type, the chain index, the locationinformation (i.e., is on the interface or not), and the three dimensional coordinate (i.e., ( x k , y k , z k ) ∈ R ). All attributes we used and their encoding techniques are speciﬁed in Supplementary Table 2.We concatenate their encodings into a vector. The vector is D =

30 dimensional and used as theinitial features of the corresponding node. The features of all the nodes are denoted as A ∈ R N × D ,where N stands for the number of nodes. As for the edges, if the distance of two nodes is shorterthan a threshold (i.e., 3Å), we assume there exists an edge between them. The connected edgeson the entire complex are denoted as E ∈ R N × N , in which the entries are either one or zero.Therefore, for a given complex, its initial graph representations are ( A , E ) . Generation of deep graphical features by graph neural network (GNN)

To capture the structure of a protein complex at the atom level, we propose a graph neural net-work named coordinate-based graph attention network (CGAT, Figure 1). CGAT shares the basicidea of graph attention network (GAT) (Veliˇckovi´c et al., 2018): for each node, CGAT uses the rep-resentations of the neighboring nodes to update its representation. But different from GAT, CGATspeciﬁcally considers the coordinates in the input vectors when controlling the update process.Speciﬁcally, given the atoms (also called nodes generally) features A and the edges E , GATlearns to capture the interaction information between atoms. Formally, GAT performs self-attentionmechanism on the nodes to indicate the importance of node j ’s features to node i , computed by s i , j = LeakyReLU ( u T [ W A i || W A j ]) , (2)where A i , the i -th vector in A , stands for the features of the node i . || represents concatenation,and LeakyReLU stands for LeakyReLU nonlinear function (He et al., 2015). W ∈ R D × D g and u ∈ R D are learnable weight matrix and vector, respectively. D g is the hidden size in GAT. In theinitial graph representations of the complex, the absolute values of three dimensional coordinatesin the node features vary a lot across different complexes, the difference between coordinates ismore useful. Thus, CGAT also integrates the difference of the two atoms into the self-attentionmechanism at the ﬁrst transformation layer. s i , j = LeakyReLU ( u T [ W A i || W A j || W (cid:48) ( A i − A j )]) , (3)16here W (cid:48) ∈ R D × D g is learnable weight matrix.To make coefﬁcients easily comparable across the different nodes adjacent to node i , we thennormalize them across all choices of j using the softmax function: e i , j = softmax ( s i , j ) = e s i , j ∑ j ∈N i e s i , j , (4)where N i stands for the neighbors of node i . Once obtained, the normalized attention coefﬁcientstogether with the corresponding atom features are used to apply weighted summation operation,resulting into the updated representations of node i , given by h i = δ ( ∑ j ∈N i e i , j W x j ) , (5)where δ ( · ) represents nonlinear function, e.g., ReLU function (He et al., 2015). The computationsof Equations (3-5) form a transformation module, called a self-attention layer.CGAT also employs multi-head attention to stabilize the learning process of self-attention, thatis, K independent attention mechanisms execute the transformation of Equation (5), and then theirfeatures are concatenated, resulting in the following feature representations. h i = M-Attention ( X , x i ) = || Kk δ ( ∑ j ∈N i e ki , j W k x j ) , (6)where e ki , j are normalized attention coefﬁcient computed by the k -th attention mechanism, and W k is the corresponding input linear transformation’s weight. Note that, in this setting, the ﬁnalreturned output h i will consist of KD g features (rather than D g ) for each node.To extract a deep representation of the product and increase the expression power of the model,we stacked L multi-attention layers. h ( l + ) i = M-Attention ( H ( l ) , h ( l ) i ) , l =

1, 2, . . . , L , (7)where H ( l ) stands for the features of all the nodes processed by l -th layer of GAT and h ( l ) i indicatesprocessed features for node i . Based on the node features processed by the last layer, to furtherenlarge the receptive ﬁeld of the transformation for each node and encourage larger values in nodefeatures, we also employ max-pooling function to gather the information from the neighboringnode as part of the ﬁnal graphical descriptors of nodes. g i , j = max j ∈N i h ( L ) j , (8) g i = || j = D g j = g i , j || h ( L ) i . (9)That is to say, given the initial node features A and the connected edges E of a complex, the GNNoutputs graphical descriptors g i for the atom i in the complex. G = { g i | i =

1, 2, . . . , N } = CGAT ( A , E ) , (10)where G is the set of the graphical descriptors of all the atoms.17 rediction of binding afﬁnity changes upon mutations by gradient-boosting tree (GBT) For the prediction of the binding afﬁnity change y ∈ R given the original protein complex x and its mutant m , GraphPPI integrates the graphical descriptors in the pre-trained GNN witha gradient-boosting tree (GBT) (Friedman, 2001). In particular, GraphPPI ﬁrst leverages the pre-trained CGAT to generate features that are expected to represent the afﬁnity change from theoriginal complex to its mutant. For both original protein x and its mutant m , the learned graphicaldescriptors of each atom at the mutated residues (denoted as G xm and G mm , respectively) andthe learned graphical descriptors of each atom at the interface residues (denoted as G xi and G mi ,respectively) are selected, which are given by G xm = [ g k ] , k ∈ S xm , (11) G mm = [ g k ] , k ∈ S mm , (12) G xi = [ g k ] , k ∈ S xi , (13) G mi = [ g k ] , k ∈ S mi , (14)where S xm stands for the set of atoms that belongs to the residue to be mutated in the originalcomplex. S mm stands for the set of atoms that belongs to the residue mutated in the mutant com-plex. S xi stands for the set of atoms that belongs to the interface residues in the original complex. S mi stands for the set of atoms that belongs to the interface residues in the mutant complex.Due to the speciﬁc design in extracting graphical descriptors in the CGAT (such as the ReLUfunction and Equation (8)), the larger values of features represent higher importance. Therefore,for each collection of the graphical descriptors (namely G xm , G mm , G xi , G mi ), we use max-poolingand mean-pooling operations to obtain their mean values and max values at each dimension overthe selected atoms, that is, F n = [ maxpooling ( G n ) , meanpooling ( G n )] , (15)maxpooling ( G n ) i = max { g k , k ∈ S n } , (16)meanpooling ( G n ) i = mean { g k , k ∈ S n } , n ∈ { xm , mm , xi , mi } , (17)where i denotes the dimension index of the learned representations.Finally, gradient boosting tree (GBT) takes these features as input to accomplish the predictionof the binding afﬁnity changes (i.e., ∆∆ G ) upon mutations. ∆∆ G = GBT ([ F xm , F xm , F xi , F mi ]) . (18) Detail implementation of the pre-training scheme

Pre-training strategies have been demonstrated to be powerful in various applications, such ascomputer vision (Pathak et al., 2016) and natural language processing (Devlin et al., 2019). The pre-training of graph networks also shows signiﬁcant performance gains in the task of the predictionof small molecular properties (Hu et al., 2020). However, due to the complexity of the dynamicsin protein structure, no pre-training scheme has been studied in this ﬁeld. In this paper, based onthe characters of protein conformations, we carefully design a novel pre-training scheme whichis speciﬁc for the prediction of the afﬁnity changes upon mutation. Generally speaking, in theproposed pre-training scheme of GraphPPI, the GNN (speciﬁcally CGAT) aims to reconstruct theoriginal structure of a complex given the disturbed one where side chains of residue are randomlyrotated. Below, we will elaborate the disturbance procedure and the reconstruction process.18 isturbance.

To produce some meaningful disturbances in the given complex, we propose torotate the side chains of a randomly selected amino acid. This idea stems from the observationthat, for a particular complex, only a few conformations can lead to the lowest free energy. Mostof the disturbances will increase the free energy and make the complex less stable. By reconstruct-ing the original conformations, a model is expected to capture the patterns of the biomolecularinteractions between atoms and those between residues in the three dimensional space.Formally, let r be a certain residue and φ r , ψ r be its two dihedral angles near the alpha carbon(Figure 1). The disturbed side chains of residue r are sampled from the distribution of correspond-ing side-chain conformations. That is, χ r ∼ p ( ·| φ r , ψ r , r ) p ( r ) , (19)where p ( ·| φ r , ψ r , r ) stands for the distribution of the side-chain conformations of the residue r ,which is approximated by a protein-dependent side-chain rotamer library proposed by Shapo-valov and Dunbrack Jr (2011). p ( r ) describes the probability of the residue r being selected duringthe disturbance. As our downstream task is to model the binding afﬁnity, which is usually char-acterized by the interface residues of the complex, we set p ( r ) to be the uniform distribution overthe interface residues and the probabilities of other non-interface residues are zero. Note that in-dividual amino acids may have different numbers of side chains. For notational simplicity, we use χ r to be the set of the angles of the side chains. Taking the Glutamic acid for example, There arethree side chains, that is, χ r = ( χ r ,1 , χ r ,2 , χ r ,3 ) .Based on the sampled side chains and the coordinates of the backbone of the original residue r , we can derive the new coordinates of each atom in residue r , which are given by ( ˆ x k , ˆ y k , ˆ z k ) = Coordinates ( k , χ r , r ) , k ∈ S ( r ) , (20)where Coordinates (( k , χ r , r )) stands for the function that yields the coordinates of atom k basedon the side chains of residue r . S ( r ) stands for the set of atoms of the side chains of residue r .Based on these new coordinates, we can update the matrix of node features, denoted by ˆ A , inwhich the features of other atoms in the graph are kept unchanged. The edges E of the complexduring the disturbance are also unchanged. Reconstruction.

The pre-training scheme requires the GNN in GraphPPI to estimate the orig-inal coordinates of the given disturbed complex. However, as the ranges of the coordinates differa lot in individual complexes, directly predicting the absolute values of the coordinates increasesthe difﬁculty of reconstruction. Instead, GraphPPI accomplishes the reconstruction by predictingthe difference in coordinates of the atoms in the disturbed residue.More speciﬁcally, we ﬁrst feed the initial atom features ˆ A of the disturbed complex into theGNN and obtains the corresponding graphical descriptors ˆ G for all the atoms.ˆ G = { ˆ g k | k =

1, 2, . . . , N } = CGAT ( ˆ A , E ) . (21)Based on the graphical descriptors ˆ g k of node k generated by the GNN, GraphPPI employs amulti-layer perceptron network (MLP) to predict the changes of the coordinate, that is, ( (cid:52) x k , (cid:52) y k , (cid:52) z k ) = MLP ( ˆ g k ) . (22)Thus, the predicted coordinate of node k can be derived by ( ˜ x k , ˜ y k , ˜ z k ) = ( ˆ x k , ˆ y k , ˆ z k ) + ( (cid:52) x k , (cid:52) y k , (cid:52) z k ) (23)19he reconstruction loss of GraphPPI is the mean square errors between the predicted coordi-nates with the original coordinates of the disturbed atoms, given by J = | S ( r ) | ∑ k ∈ S ( r ) [( x k − ˜ x k ) + ( y k − ˜ y k ) + ( z k − ˜ z k ) ] , (24)where | S ( r ) | is the cardinality of the set S ( r ) . 20 eferences Ascher, D. B., Jubb, H. C., Pires, D. E., Ochi, T., Higueruelo, A. and Blundell, T. L. (2015). Protein-proteininteractions: structures and druggability,

Multifaceted Roles of Crystallography in Modern Drug Discovery ,Springer, pp. 141–163.Barouch, D. H., Whitney, J. B., Moldt, B., Klein, F., Oliveira, T. Y., Liu, J., Stephenson, K. E., Chang, H.-W.,Shekhar, K., Gupta, S. et al. (2013). Therapeutic efﬁcacy of potent neutralizing hiv-1-speciﬁc monoclonalantibodies in shiv-infected rhesus monkeys,

Nature (7475): 224–228.Ben-Kasus, T., Schechter, B., Sela, M. and Yarden, Y. (2007). Cancer therapeutic antibodies come of age:targeting minimal residual disease,

Molecular oncology (1): 42–54.Cao, Y., Su, B., Guo, X., Sun, W., Deng, Y., Bao, L., Zhu, Q., Zhang, X., Zheng, Y., Geng, C. et al. (2020).Potent neutralizing antibodies against sars-cov-2 identiﬁed by high-throughput single-cell sequencing ofconvalescent patientsâ ˘A ´Z b cells, Cell .DeBartolo, J., Taipale, M. and Keating, A. E. (2014). Genome-wide prediction and validation of peptidesthat bind human prosurvival bcl-2 proteins,

PLoS Comput Biol (6): e1003693.Devlin, J., Chang, M.-W., Lee, K. and Toutanova, K. (2019). BERT: Pre-training of deep bidirectional trans-formers for language understanding, Proceedings of the 2019 Conference of the Association for ComputationalLinguistics , pp. 4171–4186.

URL:

Friedman, J. H. (2001). Greedy function approximation: a gradient boosting machine,

Annals of statistics pp. 1189–1232.Geng, C., Vangone, A., Folkers, G. E., Xue, L. C. and Bonvin, A. M. (2019). isee: Interface structure, evolu-tion, and energy-based machine learning predictor of binding afﬁnity changes upon mutations,

Proteins:Structure, Function, and Bioinformatics (2): 110–119.Hallen, M. A., Martin, J. W., Ojewole, A., Jou, J. D., Lowegard, A. U., Frenkel, M. S., Gainza, P., Nisonoff,H. M., Mukund, A., Wang, S. et al. (2018). Osprey 3.0: Open-source protein redesign for you, withpowerful new features, Journal of computational chemistry (30): 2494–2507.He, K., Zhang, X., Ren, S. and Sun, J. (2015). Delving deep into rectiﬁers: Surpassing human-level per-formance on imagenet classiﬁcation, Proceedings of the IEEE international conference on computer vision ,pp. 1026–1034.Hu, W., Liu*, B., Gomes, J., Zitnik, M., Liang, P., Pande, V. and Leskovec, J. (2020). Strategies for pre-traininggraph neural networks,

International Conference on Learning Representations . URL: https://openreview.net/forum?id=HJlWWJSFDH

Humphris, E. L. and Kortemme, T. (2008). Prediction of protein-protein interface sequence diversity usingﬂexible backbone computational protein design,

Structure (12): 1777–1788.Inc., A. S. (2013). Discovery studio modeling environment, release 4.0.Jankauskait ˙e, J., Jiménez-García, B., Dapk ¯unas, J., Fernández-Recio, J. and Moal, I. H. (2019). Skempi 2.0: anupdated benchmark of changes in protein–protein binding energy, kinetics and thermodynamics uponmutation, Bioinformatics (3): 462–469.Lee, M. S. and Olson, M. A. (2006). Calculation of absolute protein-ligand binding afﬁnity using path andendpoint approaches, Biophysical journal (3): 864–877.Lensink, M. F. and Wodak, S. J. (2013). Docking, scoring, and afﬁnity prediction in capri, Proteins: Structure,Function, and Bioinformatics (12): 2082–2095. iu, S., Zhang, C., Zhou, H. and Zhou, Y. (2004). A physical reference state uniﬁes the structure-derivedpotential of mean force for protein folding and binding, Proteins: Structure, Function, and Bioinformatics (1): 93–101.Maaten, L. v. d. and Hinton, G. (2008). Visualizing data using t-sne, Journal of machine learning research (Nov): 2579–2605.Moal, I. H. and Fernández-Recio, J. (2012). Skempi: a structural kinetic and energetic database of mutantprotein interactions and its use in empirical models, Bioinformatics (20): 2600–2607.Pathak, D., Krahenbuhl, P., Donahue, J., Darrell, T. and Efros, A. A. (2016). Context encoders: Feature learn-ing by inpainting, Proceedings of the IEEE conference on computer vision and pattern recognition , pp. 2536–2544.Petukh, M., Dai, L. and Alexov, E. (2016). Saambe: webserver to predict the charge of binding free energycaused by amino acids mutations,

International journal of molecular sciences (4): 547.Pierce, B. G., Wiehe, K., Hwang, H., Kim, B.-H., Vreven, T. and Weng, Z. (2014). Zdock server: interactivedocking prediction of protein–protein complexes and symmetric multimers, Bioinformatics (12): 1771–1773.Pires, D. E. and Ascher, D. B. (2016). mcsm-ab: a web server for predicting antibody–antigen afﬁnitychanges upon mutation with graph-based signatures, Nucleic acids research (W1): W469–W473.Reichmann, D., Rahat, O., Albeck, S., Meged, R., Dym, O. and Schreiber, G. (2005). The modular architectureof protein–protein binding interfaces, Proceedings of the National Academy of Sciences (1): 57–62.Rodrigues, C. H., Myung, Y., Pires, D. E. and Ascher, D. B. (2019). mcsm-ppi2: predicting the effects ofmutations on protein–protein interactions,

Nucleic acids research (W1): W338–W344.Schymkowitz, J., Borg, J., Stricher, F., Nys, R., Rousseau, F. and Serrano, L. (2005). The foldx web server: anonline force ﬁeld, Nucleic acids research (suppl_2): W382–W388.Shapovalov, M. V. and Dunbrack Jr, R. L. (2011). A smoothed backbone-dependent rotamer library forproteins derived from adaptive kernel density estimates and regressions, Structure (6): 844–858.Sirin, S., Apgar, J. R., Bennett, E. M. and Keating, A. E. (2016). Ab-bind: antibody binding mutationaldatabase for computational afﬁnity predictions, Protein Science (2): 393–409.Su, M., Yang, Q., Du, Y., Feng, G., Liu, Z., Li, Y. and Wang, R. (2018). Comparative assessment of scoringfunctions: the casf-2016 update, Journal of chemical information and modeling (2): 895–913.Suárez, M. and Jaramillo, A. (2009). Challenges in the computational design of proteins, Journal of the RoyalSociety Interface (suppl_4): S477–S491.Veliˇckovi´c, P., Cucurull, G., Casanova, A., Romero, A., Lio, P. and Bengio, Y. (2018). Graph attention networks ,ICLR.Wang, M., Cang, Z. and Wei, G.-W. (2020). A topology-based network tree for the prediction of protein–protein binding afﬁnity changes following mutation,

Nature Machine Intelligence (2): 116–123.Xiong, P., Zhang, C., Zheng, W. and Zhang, Y. (2017). Bindprofx: assessing mutation-induced bindingafﬁnity change by protein interface proﬁles with pseudo-counts, Journal of molecular biology (3): 426–434.Yang, Y. and Zhou, Y. (2008). Speciﬁc interactions for ab initio folding of protein terminal regions withsecondary structures,

Proteins: Structure, Function, and Bioinformatics (2): 793–803. hang, N., Chen, Y., Lu, H., Zhao, F., Alvarez, R. V., Goncearenco, A., Panchenko, A. R. and Li, M. (2020).Mutabind2: predicting the impacts of single and multiple mutations on protein-protein interactions, Iscience p. 100939.Zhang, Y. and Skolnick, J. (2005). Tm-align: a protein structure alignment algorithm based on the tm-score,

Nucleic acids research (7): 2302–2309.Zhou, H. and Zhou, Y. (2002). Distance-scaled, ﬁnite ideal-gas reference state improves structure-derivedpotentials of mean force for structure selection and stability prediction, Protein science (11): 2714–2726. upplementary Information forPre-training of Graph Neural Network for ModelingProtein-Protein Binding Afﬁnity Changes FollowingMutation Xianggen Liu , Yunan Luo , Sen Song and Jian Peng Supplementary Figures

Supplementary Figure 1:

The similarities between arbitrary two complexes in the s645 set. Department of Computer Science, University of Illinois at Urbana Champaign, IL, USA. Laboratory for Brain and Intelligence and Department of Biomedical Engineering, Tsinghua University, Beijing,China. Beijing Innovation Center for Future Chip, Tsinghua University, Beijing, China. A221A4Y1AHW1AK41B2S1B2U1B3S1B411BJ11BP31BRS1C1Y1C4Z1CZ81DQJ1DVF1E501EFN1EMV1FCC1FSS1GC11GL01GL11GUA1HE81JRH1JTG1K8R1KBH1LFD1MAH1MHP1MLC1MQ81N8Z1OHZ1PPF1QAB1R0R1REW1SBB1TM11VFB1WQJ1XXM1Y4A1YCS1YQV1YY91Z7X2ABZ2B2X2C5D2CCL2DVW2G2U2J0T2KSO2NOJ2NY72NYY2NZ92O3B2OOB2PCC2QJ92QJA2QJB2VN52WPT3AAA3BDY3BE13BP83EG53G6D3HFM3IDX3KUD3L5X3LZF3M633MZG3MZW3NGB3S9D3SGB3U823VR64B0M4CVW4E6K4G0N4GNK4GU04J2L4JEU4JPK4K714KRL4KRO4KRP4MYW4NKQ4OFY4RA04RS14UYP4UYQ4WND4X4M4Y614YFD4YH75C6T5CYK5M2O5UFE5UFQ

Supplementary Figure 2:

The similarities between arbitrary two complexes in the M1707 set. .3 P e a r s o n c o rr e l a t i o n c o e ff i c i e n t EGAT MLP 00.1 P e a r s o n c o rr e l a t i o n c o e ff i c i e n t EGAT MLP

A B

GNN GNNWith pre-training Without pre-training With pre-training Without pre-training

Supplementary Figure 3:

The ablation Study on AB-Bind S645 set. (A) The prediction performance ofGraphPPI with different pre-training strategies and transformation layers (i.e., CGAT, MLP) on the ten-foldCV test. (B) The prediction performance of GraphPPI with different pre-training strategies and transforma-tion layer on the leave-out-one-complex CV test. To test the effectiveness of the GNN model. we built acontrol framework that uses a multiple layer perceptron (MLP) to replace the GNN. More speciﬁcally, eachmulti-attention transformation layer (Equation (7)) was replaced by an MLP layer. The main difference be-tween the GNN and MLP lies in the way of processing the information of neighboring nodes. For a node inthe graph, MLP updates the representations based on its own representations, while the GNN is also ableto integrate the information from the neighboring nodes. upplementary Table 1: The optimal hyperparameters of GraphPPI calibrated using a grid search proce-dure. The hyperparameter tuning process involved the hidden size D g , the number of attention heads K , number of hidden layers L . We applied a coarse grid search approach over D g ∈ { } , K ∈ {

2, 4, 6, 8, 16 } , L ∈ {

1, 2, 3, 4, 5 } on the development set to select the best settings of these hyperpa-rameters. Hyperparameter Selected values D g K L Supplementary Table 2:

Node features and corresponding encoding methods.

Features