SkipGNN: Predicting Molecular Interactions with Skip-Graph Networks
Kexin Huang, Cao Xiao, Lucas Glass, Marinka Zitnik, Jimeng Sun
SSkipGNN: Predicting Molecular Interactionswith Skip-Graph Networks
Kexin Huang , Cao Xiao , Lucas M. Glass , Marinka Zitnik , and Jimeng Sun Health Data Science, Harvard T.H. Chan School of Public Health, Boston, MA Analytic Center of Excellence, IQVIA, Cambridge, MA Department of Biomedical Informatics, Harvard University, Boston, MA Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, ILMay 1, 2020
AbstractMotivation:
Molecular interaction networks are powerful resources for the discovery. They are increasinglyused with machine learning methods to predict biologically meaningful interactions. While deep learning ongraphs has dramatically advanced the prediction prowess, current graph neural network (GNN) methods areoptimized for prediction on the basis of direct similarity between interacting nodes. In biological networks,however, similarity between nodes that do not directly interact has proved incredibly useful in the lastdecade across a variety of interaction networks.
Results:
Here, we present SkipGNN, a graph neural network approach for the prediction of molecularinteractions. SkipGNN predicts molecular interactions by not only aggregating information from directinteractions but also from second-order interactions, which we call skip similarity. In contrast to existingGNNs, SkipGNN receives neural messages from two-hop neighbors as well as immediate neighbors in theinteraction network and non-linearly transforms the messages to obtain useful information for prediction.To inject skip similarity into a GNN, we construct a modified version of the original network, called the skipgraph. We then develop an iterative fusion scheme that optimizes a GNN using both the skip graph and theoriginal graph. Experiments on four interaction networks, including drug-drug, drug-target, protein-protein,and gene-disease interactions, show that SkipGNN achieves superior and robust performance, outperformingexisting methods by up to 28.8% of area under the precision recall curve (PR-AUC). Furthermore, we showthat unlike popular GNNs, SkipGNN learns biologically meaningful embeddings and performs especiallywell on noisy, incomplete interaction networks.
Availability:
SkipGNN method, experiments, and all datasets will be open-sourced through https://github.com/kexinhuang12345/SkipGNN . Contact: [email protected]
Molecular interaction networks are ubiquitous in biological systems. Over the last decade, interactionnetworks have advanced our systems-level understanding of biology [Cowen et al., 2017]. Further, they haveenabled discovery of biologically significant, yet previously unmapped relationships [Zitnik et al., 2019a], a r X i v : . [ q - b i o . M N ] A p r nteraction Similarity Direct similarity
A BCA BC
Skip similarity
Figure 1:
Direct versus skip similarity. (Left)
Traditionally, an interaction between nodes A and B impliesthat A and B are similar and vice versa [McPherson et al., 2001]. (Right)
In contrast, in molecular interactionnetworks, directly interacting entities are not necessarily similar, which has been observed in numerousnetworks, including genetic interaction networks [Costanzo et al., 2010, 2016] and protein-protein interactionnetworks [Kovács et al., 2019, Zitnik et al., 2019b]. including drug-target interactions (DTIs) [Luo et al., 2017a], drug-drug interactions (DDIs) [Zitnik et al.,2018a], protein-protein interactions (PPIs) [Luck et al., 2020], and disease-gene associations (DGIs) [Agrawalet al., 2018]. To assist in these discoveries, a plethora of computational methods, primarily optimized forlink prediction from networks ( e.g. , Lei and Ruan [2013]), were developed to predict new interactions inmolecular networks. Recently, deep learning on graphs has emerged as a dominant class of methods thathave revolutionized state-of-the-art in learning and reasoning over network datasets. These methods, oftenreferred to as graph neural networks (GNNs) [Wu et al., 2019] and graph convolutional networks (GCNs) [Kipfand Welling, 2017, Veličković et al., 2018], operate by performing a series of non-linear transformations onthe input molecular network, where each transformation aggregates information only from immediateneighbors, i.e. , direct interactors in the network. While these methods yield powerful predictors, theyexplicitly take into account only direct similarity between nodes in the network. Therefore, GNNs arelimited at fully capturing important information for prediction that resides further away from a particularinteraction in the network that we want to predict [Abu-El-Haija et al., 2019].Indirect similarity between nodes that do not directly interact, e.g. , the similarity in second-order inter-actions, has proved incredibly useful across a variety of molecular networks, including genetic interactionand protein-protein interaction networks [Costanzo et al., 2010, 2016, Zitnik et al., 2019b, Kovács et al., 2019].This is because interactions can exist between nodes that are not necessarily similar, as illustrated in Figure 1.For example, in a drug-target interaction (DTI) network, an edge indicates that a drug binds to a targetprotein. Thus, two drugs are similar because they bind to the same target protein. In contrast, a drug anda target protein are not biologically similar, although they are connected by an edge in the DTI network.This example illustrates the importance of second-order interactions, which we refer to as skip similarity (Figure 1). For this reason, we need GNNs to predict molecular interactions, not only via direct interactionsbut also via similarity in second-order interactions.
Present work.
Here, we present SkipGNN, a graph neural network (GNN) method for the prediction ofmolecular interactions. In contrast to existing GNNs, such as GCN [Kipf and Welling, 2017], SkipGNNspecifies a neural architecture, in which neural messages are passed not only via direct interactions, referredto as direct similarity, but also via similarity in second-order interactions, referred to as skip similarity (Figure 1). Importantly, while the principle of skip similarity governs many types of molecular interactionnetworks, popular GNN methods fail to capture the principle. Because of that, as we show here, they cannotfully utilize molecular interaction networks. SkipGNN takes as input a molecular interaction network anduses it to construct a skip graph . This second-order network representation captures the skip similarity . kipGNN then uses both the original graph ( i.e. , the input interaction network) and the skip graph to learnwhat is the best way to propagate and transform neural messages along edges in each graph to optimize forthe discovery of new interactions.We evaluate SkipGNN on four types of interaction networks, including two homogeneous networks, i.e. , drug-drug interaction and protein-protein interaction networks, and two heterogeneous networks, i.e. , drug-target interaction and gene-disease interaction networks. SkipGNN outperforms baselines thatuse random walks, shallow network embeddings, spectral clustering, and network metrics by up to 28.8%in PR-AUC [Perozzi et al., 2014, Grover and Leskovec, 2016, Ribeiro et al., 2017, Tang and Liu, 2011, Kovácset al., 2019]. Further, the method shows a 7.9% improvement in PR-AUC over state-of-the-art graph neuralnetworks [Kipf and Welling, 2017, 2016].By examining SkipGNN’s performance in increasingly harder prediction settings when large fractionsof interactions are removed from the network, we find that SkipGNN achieves robust performance. Inparticular, across all interaction networks, SkipGNN consistently outperforms all baseline methods, evenwhen interaction networks are highly incomplete (Section 5.1-5.2). We find that the robust performance ofSkipGNN can be explained by the spectral property of skip graph, as it can preserve network structure in theface of incomplete interaction information (Section 5.3), which is also confirmed experimentally (Section 5.5).Further, we examine embeddings learned by SkipGNN and find that SkipGNN learns biologicallymeaningful embeddings, whereas a regular GCN does not (Section 5.4). For example, when analyzing adrug-target interaction network, SkipGNN generates the embedding space in which drugs are generallyseparated from most of proteins while still being positioned close to the proteins to which they directly bind.Lastly, in the case of the drug-drug interaction network, we use the literature search to find evidence forSkipGNN’s novel drug-drug interaction predictions (Section 5.6). Related work on link prediction.
Existing link prediction methods belong to one of the followingcategories. (1) Heuristic or mechanistic methods ( e.g. , Lü and Zhou [2011], Menche et al. [2015], Durán et al.[2018], Kovács et al. [2019]) calculate an index similarity score to measure the probability of a link giventhe network structure around the two target nodes, such as Preferential Attachment (PA) [Barabási andAlbert, 1999] and Local Path Index (LP) [Lü et al., 2009]. However, these methods usually make strongassumptions about the network structure and hence suffer from instability of performance [Lü and Zhou,2011, Kovács et al., 2019]. (2) Direct embedding methods generate embeddings for every node in the networkcapturing the node’s local network topology ( e.g. , Zitnik and Zupan [2015], Wang et al. [2018], Xu et al. [2019]).A popular approach is to use random walks with a skip-gram model, such as DeepWalk [Perozzi et al., 2014],node2vec [Grover and Leskovec, 2016], and LINE [Tang et al., 2015]. The other popular approach leveragesthe spectral graph theory to generate a spectral embedding such as spectral clustering [Tang and Liu, 2011].The generated node embeddings are then fed into a decoder classifier to predict the link existing probability.(3) Neural embedding methods, such as Graph Neural Networks (GNNs) [Kipf and Welling, 2017, Hamiltonet al., 2017], Variational Graph Autoencoders (VGAE) [Kipf and Welling, 2016, Ma et al., 2018], and GraphAttention Networks (GAT) [Veličković et al., 2018] use neighborhood message passing scheme to generatenode embeddings and these embeddings are directly optimized in an end-to-end manner by a link predictionloss ( e.g. , cross-entropy). GNN-based methods have been shown to achieve state-of-the-art performanceacross a variety of network datasets [Zhang and Chen, 2018, Kipf and Welling, 2016]. SkipGNN furtherimproves GNN-based methods by capturing the skip similarity that other GNN-based methods are incapableof capturing.
Related work on molecular interaction prediction.
In molecular interaction networks, the goal is topredict if a given pair of biomedical entities such as proteins, drugs or diseases will interact. We can dividemethods for interaction prediction into three main groups. (1) Structural representation learning generatesembeddings for each entity using the entity’s structural representation, such as a compound’s molecular graph r a protein’s amino acid sequence. The embeddings of two entities are then combined and fed into a decoderfor prediction. For example, Tsubaki et al. [2019], Öztürk et al. [2018], Gao et al. [2018] use graph-convolutional(GCN) and convolutional (CNN) networks on molecular graphs and gene sequence data to predict bindingof drugs to target proteins. Similarly, Huang et al. [2020], Ryu et al. [2018], Cheng and Zhao [2014] learnembedding for drugs and concatenate embeddings of drug pairs to predict drug-drug interactions. (2)Similarity-based learning is based on the assumption that entities with similar interaction patterns are likelyto interact. These methods devise a similarity measure ( e.g. , a graphlet-based signature of proteins in the PPInetwork [Milenković and Pržulj, 2008]) and then use the measure to predict interactions based on how similara candidate interaction is to known interactions. A variety of techniques are used to aggregate similarityvalues and score interactions, including matrix factorization [Zhang et al., 2018], clustering [Ferdousi et al.,2017], and label propagation [Zhang et al., 2015]. (3) Finally, network relational learning views the task as anetwork completion problem. It uses network structure together with side information about nodes tocomplete the network and predict interactions [Zitnik et al., 2018a, Ma et al., 2018, Zitnik and Leskovec, 2017].SkipGNN belongs to the structural representation learning category. Background on Graph Neural Networks (GNNs).
Next, we describe graph neural networks as they arestate-of-the-art models for link prediction and are also the focus of our study. The input to a GNN is thenetwork, represented by its adjacency matrix A . Most often, the goal (output) of the GNN is to learn anembedding for each node in the network by capturing the network structure as well as node attributes.GNN can be represented as a series of neighborhood aggregations layers ( e.g. , Kipf and Welling [2017]): H ( l + ) = σ ( (cid:101) D − (cid:101) A (cid:101) D − H ( l ) W ) , where H ( l ) is a matrix of node embeddings at the l -th layer, H ( ) are inputnode attributes, W is a trainable parameter matrix, σ is a non-linear activation function, and (cid:101) D and (cid:101) A arethe renormalized degree and adjacency matrices, defined as: (cid:101) A = A + I and (cid:101) D ii = (cid:205) j (cid:101) A ij ( I is the identitymatrix). The GNN propagates information across network neighborhoods and transforms the informationin a way that is most useful for a downstream prediction tasks, such as link prediction. However, GNN islimited at capturing skip similarity , where SkipGNN utilizes an additional skip-graph to fully exploit thisimportant quality for biomedical interaction network. SkipGNN is a graph neural network uniquely suited for molecular interactions. SkipGNN takes as input amolecular interaction network and uses it to construct a skip graph, which is a second-order network repre-sentation capturing the skip similarity . SkipGNN then specifies a novel graph neural network architecturethat fuses the original and the skip graph to accurately and robustly predict new molecular interactions.
Problem formulation.
Consider an interaction network G on N nodes representing biomedical entities (cid:86) ( e.g. , drugs, proteins, or diseases) and M edges (cid:69) representing interactions between the entities. Forexample, G can be a drug-target interaction network recording information on how drugs bind to theirprotein targets [Luo et al., 2017b]. For every pair of entities i and j , we denote their interaction with a binaryindicator e ij ∈ { , } , indicating the experimental evidence that i and j interact ( i.e. , e ij =
1) or the absence ofevidence for interaction ( i.e. , e ij = G as A , where A ij is 1 if nodes i and j are connected ( e ij =
1) in the graph and otherwise 0 ( e ij = D is the degree matrix, a diagonalmatrix, where D ii is the degree of node i . Problem (Molecular Interaction Prediction) . Given a molecular interaction network G = ( (cid:86) , (cid:69) ) , we aim tolearn a mapping function f : (cid:69) → [ , ] from edges to probabilities such that f ( i , j ) optimizes the probability thatnodes i and j interact. Next, we describe skip graphs, the key novel representation of interaction networks that allow for effectiveuse of GNNs for predicting interactions. We realize
Skip similarity by encouraging the GNN model to embed (0)
Neural architecture of SkipGNN. (Left)
SkipGNN constructs skip graph G s (denoted by adjacencymatrix A s ) based on the input graph G (denoted by adjacency matrix A ) using Eq. (1). (Middle) Initial nodeembeddings, H ( ) and S ( ) , are specified using side information ( e.g. , gene expression vectors if nodes representgenes) or generated using node2vec [Grover and Leskovec, 2016]. In SkipGNN, node embeddings are thenpropagated along edges of G s and G and transformed through a series of computations (layers), which outputpowerful embeddings that can then be used for downstream prediction of interactions. Illustrated is a two-layer iterative fusion scheme. In the first layer, two GNNs with parameter weight matrices W ( ) o and W ( ) s (operating on A and A s , respectively) are fused via weight matrices W (cid:48) ( ) o and W (cid:48) ( ) s based on Eq. (2, 3). Thiscompletes computations in the first layer of SkipGNN, producing embeddings H ( ) and S ( ) . In the secondlayer, those embeddings are transformed via W ( ) o and W ( ) s using Eq. (4), resulting in final embeddings E . (Right) Embeddings E i and E j of target nodes i and j are retrieved, concatenated, and then fed into a decoder(parameterized by W d ). Decoder returns p ij , representing the probability that nodes i and j interact. skipped nodes close together in the embedding space. To do that, we construct skip graph G s , in two-hopneighbors are connected by edges. This construction creates paths in G s along which neural messages can beexchanged between the skipped nodes.Formally, we use the following operator to obtain the skip graph’s adjacency matrix A s : A ijs = (cid:26) ∃ k s . t . ( i , k ) ∈ (cid:69) and ( k , j ) ∈ (cid:69) . The corresponding degree matrix is D iis = (cid:205) j A ijs . An efficient way to implement the skip graph is throughmatrix multiplication: A s = sign ( AA T ) , (1)where: sign ( x ) is the sign function, sign ( x ) = x > AA T . It counts the number of two-hop paths from node i to j. Hence, if an entry for node i , j in AA T is largerthan 0, it means there exists a skipped node between node i , j . Then, we convert the positive entry into 1 to Notation used in SkipGNN.
Notation Definition G : { (cid:86) , (cid:69) } Graph with nodes (cid:86) and edges (cid:69) D , A ∈ (cid:90) N × N Degree and adjacency matrices for graph G (cid:101) D , (cid:101) A ∈ (cid:90) N × N Normalized degree and adjacency matrices for G X ∈ (cid:82) N × D D -dimensional node embeddings e ij ∈ { , } Ground-truth interaction between nodes i and jG s Skip graph D s , A s ∈ (cid:90) N × N Degree and adjacency matrices for G s (cid:102) D s , (cid:101) A s ∈ (cid:82) N × N Normalized degree and adjacency matrices for G s H ( l ) , S ( l ) Node embeddings for G and G s , in layer l E Final node embeddings p ij ∈ [ , ] Probability of interaction between nodes i and jy ij ∈ { , } Binary indicator of interaction between nodes i and j (cid:76) ∈ (cid:82) Binary classification loss W ( l ) o , W ( l ) s Weight matrix for original ( o ) and skip ( s ) graphs, layer l W (cid:48) ( l ) o Weight matrix for skip-to-original-graph fusion W (cid:48) ( l ) s Weight matrix for original-to-skip-graph fusion W d , b Decoder weight matrix and bias parameter construct the skip graph’s adjacent matrix. Given this skip graph, we proceed to describe the full SkipGNNmodel.
In this section, we describe how we leverage the skip graph for link prediction. After we generate the novelskip graph from Section 3.1, we propose an iterative fusion scheme for SkipGNN to allow the skip graphand the original graph to learn from each other for better integration. Lastly, a decoder is used to output aprobability measuring if the given pair of molecular entities interact.
We want a model to automatically learn how to balance between direct similarity and skip similarity in thefinal embedding. We design an iterative fusion scheme with aggregation gates to combine both similarityinformation. The motivation is that to represent biomedical entity to its fullest extent, node embeddingmust capture its complicated bioactive functions with skip/direct similarities . Hence, instead of simplyconcatenating the output node embeddings from the GNN output of the original graph G that captures direct similarity and skip graph G s that captures skip similarity , we allow two GNNs on G and G s to interactwith each other iteratively via the following propagation rules (see Figure 3): H ( l + ) = σ ( AGG ( FH ( l ) W ( l ) o , F s S ( l ) W (cid:48) ( l ) o )) (2) S ( l + ) = σ ( AGG ( F s S ( l ) W ( l ) s , FH ( l + ) W (cid:48) ( l ) s )) , (3)where: F = (cid:101) D − (cid:101) A (cid:101) D − , F s = (cid:101) D − s (cid:101) A s (cid:101) D − s . ere, H ( l ) , S ( l ) is node embeddings at the l -th layer from direct similarity graph G and skip similarity graph G S , respectively. F , F s are the re-normalized adjacency matrices from direct similarity and skip similarity ,respectively. And W ( l ) o , W (cid:48) ( l ) o , W ( l ) s , W (cid:48) ( l ) s are the transformed weights for layer l . H ( ) and S ( ) are set to be X ,the input node attributes generated from node2vec. The aggregate gate AGG in Eq. (2-3) can be a summation,a Hadamard product, max-pooling, or some other aggregation operator [Cao et al., 2020]. Empirically, we findthat summation gate has the best performance. σ () is the activation function and we use ReLU (·) = max (· , ) to add non-linearity in the propagation.In each iteration, the node embedding for original graph H ( l + ) is first updated with its previous layer’snode embedding H ( l ) , combined with skip graph embedding S ( l ) . After obtaining the updated original graphembedding H ( l + ) , we then update the skip graph embedding S ( l + ) in a similar fashion.This update rule is very different from simple concatenation as it is an iterative process where eachupdate of the node embedding for each graph is affected by the most recent node embedding from bothgraphs. This way, two embedding are learned to find the best dependency structure between each other andfuse into one final embedding instead of a simple concatenation. In the last layer, final node embedding E isobtained through: E = AGG ( FH ( L max ) W ( L max ) o , F s S ( L max ) W ( L max ) s ) , (4)where L max is the last layer index. As in the motivation, we are interested only in up to second orderneighbors, we set L max =
1, see Figure 3. We don’t use activation function here as it does not require an extranon-linear transformation to be fed into the decoder network. Empirically, we show this fusion schemeboosts predictive performance in Section 5.5.
Given the target nodes ( i , j ) and their corresponding node embedding E i , E j , we implement a neural networkas a decoder to first combine E i , E j to obtain an input embedding through a COMB function (e.g., concate-nation, sum, Hadamard product). Then, the combined embedding is fed into a neural network parametrizedby weight W d and bias b as a binary classifier to obtain probability p ij : p ij = σ ( W d COMB ( E i , E j ) + b ) , (5)where p ij represents the probability that nodes i and j interact ( i.e. , f ( i , j ) . We use concatenation as theCOMB function as it consistently yield the best performance across different types of networks. The overall algorithm is shown in Algorithm 1. Here, we only leverage accessible network information(adjacent matrix A of the network G ) to predict links. In all experiments, we initialize embeddings usingnode2vec [Grover and Leskovec, 2016] as: X = node2vec ( A ) . Second, we construct the skip graph with adjacent matrix A s via Eq. (1) to capture the skip-similarity principle. Next, at every step, a mini-batch of interaction pairs (cid:77) with labels y is sampled. Then, two graphconvolutions networks are used for the original graph and the skip graph respectively. In the propagationstep, we use iterative fusion (Eq. (2), Eq. (3)) to naturally combine embeddings convolved on the original graphand on the skip graph, corresponding to direct and skip similarity , respectively. In the last layer, embeddingsare stored in E . We then retrieve the embeddings for each node in the mini-batched pairs (cid:77) and concatenatethem to feed into decoder (Eq. (5)).During training, we optimize the SkipGNN’s parameters W ( l ) o , W (cid:48) ( l ) o , W ( l ) s , W (cid:48) ( l ) s , W d , b in an end-to-endmanner through a binary classification loss: (cid:76) = (cid:205) ( i , j )∈ (cid:77) y ij log p ij + ( − y ij ) log ( − p ij ) , where y ij isthe true label for nodes i and j that are sampled during training via mini-batching, ( i , j ) ∈ (cid:77) , and (cid:77) is amini-batch of interaction pairs. After the model is trained, it can be used to make predictions. Given twoentities i and j , the model predicts probability f ( i , j ) that i and j interact. lgorithm 1: The SkipGNN Algorithm
Input: interaction network G with adjacent matrix ANode Embedding Generation, e.g. :X ← node2vec ( A ) Skip Graph Construction (Section 3.1): A s ← SkipGraph ( A ) via Eq. (1) for t = . . . T max do sample mini-batch of training node pairs (cid:77) ⊆ (cid:69) with corresponding labels y for l = . . . ( L max − ) doIterative Fusion (Section 3.2.1): H ( l + ) ← FusionGate ( H ( l ) ) via Eq. (2) S ( l + ) ← FusionGate ( S ( l ) ) via Eq. (3) E ← FuseGate ( H ( L max ) , S ( L max ) ) via Eq. (4) Decoder (Section 3.2.2): p ij ← decode ( E ) via Eq. (5)Compute the loss value (cid:76) using p ij and y (Sec. 3.3) and update model parameters via gradientdescent Next we provide details on molecular interaction datasets, baseline methods, and experimental setup.
We consider four publicly-available network datasets. (1)
BIOSNAP-DTI [Zitnik et al., 2018b] contains 5,018drugs that target 2,325 protein through 15,139 drug-target (DTI) interactions. (2)
BIOSNAP-DDI [Zitnik et al.,2018b] consists of 48,514 drug-drug interactions (DDIs) between 1,514 drugs extracted from drug labels andbiomedical literature. (3)
HuRI-PPI [Luck et al., 2019] is the human reference protein-protein interactionnetwork generated by multiple orthogonal the high-throughput yeast two-hybrid screens. We use HI-IIInetwork, which has 5,604 proteins and 23,322 interactions. (4) Finally, we consider
DisGeNET-GDI [Piñeroet al., 2019] collects curated disease-gene associations (GDIs) from GWAS studies, animal models and scientificliterature. The dataset has 81,746 interactions between 9,413 genes and 10,370 diseases.
We implemented SkipGNN using PyTorch deep learning framework . We use a server with 2 Intel XeonE5-2670v2 2.5GHZ CPUs, 128GB RAM and 1 NVIDIA Tesla P40 GPU. We set optimization parameters as follows:learning rate is 5e-4 using the Adam optimizer [Kingma and Ba, 2014], mini-batch size is | (cid:77) | = L max =
1, hidden size in the first layer as d ( ) =
64 andhidden size in the second layer as d ( ) = We compare SkipGNN to seven powerful predictors of molecular interactions from network science andgraph machine-learning fields. The source code implementation of SkipGNN is available at https://github.com/kexinhuang12345/SkipGNN . Data statistics. ‘A’ indicates average node degree.Dataset Prediction task
From machine learning, we use three direct network embedding methods:
DeepWalk [Perozzi et al.,2014], node2vec [Grover and Leskovec, 2016], and we also include struc2vec [Ribeiro et al., 2017]. Thelatter method is conceptually distinct by leveraging local network structural information, while the formermethods use random walks to learn embeddings for nodes in the network. Further, we examine two graphneural networks:
VGAE [Kipf and Welling, 2016] and
GCN [Kipf and Welling, 2017]. From network science,we consider
Spectral Clustering [Tang and Liu, 2011]. We also use L3 [Kovács et al., 2019] heuristic, whichwas recently shown to outperform over 20 network science methods for the problem of PPI prediction.Further details on baseline methods, their implementation and parameter selection are in supplementary. In all our experiments, we follow an established evaluation strategy for link prediction ( e.g. , Zhang and Chen[2018], Zitnik et al. [2018a]). We divide each dataset into train, validation, and test sets in a 7:1:2 ratio, whichyields positive examples (molecular interactions). We generate their negative counterparts by samplingthe complement set of positive examples. For every experiment, we conduct five independent runs withdifferent random splits of the dataset. We select the best performing model based on the loss value on thevalidation set. The performance of selected model is calculated on the test set. To calculate predictionperformance, we use: (1) area under precision-recall curve (PR-AUC): PR-AUC = (cid:205) nk = Prec ( k ) ΔRec ( k ) , where k is k -th precision/recall operating point (Prec ( k ) , Rec ( k ) ); and (2) area under the receiver operatingcharacteristics curve (ROC-AUC): ROC-AUC = (cid:205) nk = TP ( k ) ΔFP ( k ) , where k is k -th true-positive and false-positive operating point (TP ( k ) , FP ( k ) ). Higher values of PR-AUC and ROC-AUC indicate better predictiveperformance. Next, we conduct a variety of experiments to investigate the predictive power of SkipGNN (Section 5.1). Wethen study the method’s robustness to noise and missing data (Section 5.2 and Section 5.3) and demonstratethe skip similarity principle (Section 5.4). Next, we conduct ablation studies to examine contributions of eachof SkipGNN’s components towards the final SkipGNN performance (Section 5.5). Finally, we investigatenovel predictions made by SkipGNN (Section 5.6).
We start by evaluating SkipGNN on four distinct types of molecular interactions, including drug-targetinteractions, drug-drug interactions, protein-protein interactions, and gene-disease associations, and wethen compare SkipGNN’s performance to baseline methods.In each interaction network, we randomly mask 30% interactions as the holdout validation (20%) and test(10%) sets. The remaining 70% interactions are used to train the SkipGNN and each of the baselines. Aftertraining, each method is asked to predict whether pairs of entities in the test set will likely interact.
Predictive performance.
SkipGNN achieves the best performance across all metrics and taskscompared to baselines. Results of five independent runs on DDI, PPI and DTI tasks on state of the art linkprediction algorithms. Task Method PR-AUC ROC-AUCDTI DeepWalk 0 . ± .
008 0 . ± . . ± .
005 0 . ± . . ± .
007 0 . ± . . ± .
007 0 . ± . . ± .
004 0 . ± . . ± .
010 0 . ± . . ± .
011 0 . ± . . ± .
006 0 . ± . DDI DeepWalk 0 . ± .
012 0 . ± . . ± .
004 0 . ± . . ± .
012 0 . ± . . ± .
009 0 . ± . . ± .
004 0 . ± . . ± .
076 0 . ± . . ± .
005 0 . ± . . ± .
006 0 . ± . PPI DeepWalk 0 . ± .
008 0 . ± . . ± .
010 0 . ± . . ± .
004 0 . ± . . ± .
003 0 . ± . . ± .
003 0 . ± . . ± .
004 0 . ± . . ± .
002 0 . ± . . ± .
003 0 . ± . GDI DeepWalk 0 . ± .
007 0 . ± . . ± .
006 0 . ± . . ± .
006 0 . ± . . ± .
002 0 . ± . . ± .
001 0 . ± . . ± .
006 0 . ± . . ± .
002 0 . ± . . ± .
003 0 . ± .
0% 50% 70% 90%% oI 0issing Edges0.50.60.70.80.9 - A C D7I
30% 50% 70% 90%% oI 0issing Edges0.50.60.70.8 - A C DDI
30% 50% 70% 90%% oI 0issing Edges0.650.700.750.800.850.90 - A C
30% 50% 70% 90%% RI 0iVVing EGgeV0.800.850.900.95 - A C GDI
SNiSGNNGCN9GAESCnRGe2vec
30% 50% 70% 90%% oI 0issing EGges0.800.850.900.95 - A C GDI
Figure 3:
Predictive performance as a function of network incompleteness.
SkipGNN provides robustresult in varying fraction of missing edges. Five-fold average with 95% confidence interval for PR-AUC againstvarious fractions of missing edges on four prediction tasks: drug-target interaction prediction (DTI), drug-drug interaction prediction (DDI), protein-protein interaction prediction (PPI) and gene-disease interactionprediction (GDI) on node2vec, Spectral Clustering (SC), Variational Graph Auto-Encoder (VGAE), GraphConvolutional Network (GCN), and SkipGNN. We omit DeepWalk as it has similar performance as node2vec.SKIP-GCN consistently shows the best performance even when networks are highly incomplete.
We report results in Table 4.3. We see that SkipGNN has outperforms all baseline methods across allmolecular interaction networks. Specifically, we see up to 2.7% improvement of SkipGNN over GCN andup to 8.8% improvement over VGAE on PR-AUC. While GCN and VGAE can only use direct similarity , thisfinding provides evidence that considering skip similarity and direct similarity together, as is made possibleby SkipGNN, is important to be able to accurately predict a variety of molecular interactions. Compared todirect embedding methods, SkipGNN has up to 28.8% increase over DeepWalk, 20.4% increase over node2vec,and 15.6% over spectral clustering on PR-AUC. These results support previous observations [Zitnik et al.,2018a] that graph neural networks can learn more powerful network representations than direct embeddingmethods. Finally, all baselines vary in performance across datasets/tasks while SkipGNN consistently yieldsthe most powerful predictor.
Next, we test SkipGNN’s performance on incomplete interaction networks. Due to knowledge gaps inbiology, many of today’s interaction networks are incomplete and thus it is crucial that methods are robustand able to perform well even when many interactions are missing.In this experiment, we let each method be trained on 10%, 30%, 50%, and 70% of edges in the DTI, DDI, andPPI datasets and predict on the rest of the data (we use 10% of test edges as validation set for early stopping).Results in Figure 3 show that SkipGNN gives the most robust results among all the methods. In all tasks,SkipGNN achieves strong performance even when having access to only 10% of the interactions. Further,
The ability of skip graph and original graph to capture the network structure in the face ofincomplete data.
Skip graph can better preserve the network structure than the original graph, as evidencedby skip graph’s smaller relative error (Section 5.2) than that of the original graph. This is true for all % ofmissing edges, indicating that skip graph can keep useful information about interaction structure even whennetworks are highly incomplete and many interactions are missing. in almost every percentage point, SkipGNN is better than the baselines. In addition, we see that VGAE isnot robust as its performance dropped to around 0.5 PR-AUC in highly-incomplete settings on DTI andDDI tasks. Performance of node2vec and GCN steadily improve as the percentage of seen edges increases.Further, while spectral clustering is robust to incomplete data, its performance varies substantially withtasks. We conclude that SkipGNN is robust and is especially appropriate for data-scarce networks.
So far, we found that SkipGNN has robust performance on incomplete interaction networks and next weinvestigate what makes SkipGNN to perform so robustly. We hypothesize that SkipGNN is robust becauseits skip graphs can preserve the graph topology much better than original graphs and this feat becomesprominent when interaction data are scarce. Note that SkipGNN uses the skip graph whereas other methodsonly use the original graph.To test the hypothesis, we measure the relative error between the original graph G and the incompletegraph G p in which edges are missing at rate p . We use a metric that calculates the relative error of the spectralnorm for the graph Laplacian matrix: Err ( A , p ) = ((cid:107) L (cid:107) − (cid:107) L p (cid:107) )/(cid:107) L (cid:107) , where L = A − D , L p = A p − D p , A ( A p ) is adjacency matrix of G ( G p ), (cid:107) · (cid:107) = σ max (·) , the σ max is the largest singular value [Chung and Graham,1997].Figure 4 shows the relative error Err of original and skip graphs against 100 fractions p of missing edgeson the DDI task. We see that the skip graph’s relative error is much lower than that of original graph inalmost all settings. This observation provides evidence for our hypothesis, confirming that skip graphs canbetter capture the graph topology than original graphs. Because of that, SkipGNN can learn high-qualityembeddings even when interaction data are scarce. Next, we visualize embeddings learned by GCN and SkipGNN in an effort to investigate whether SkipGNNcan better capture the structure of interaction networks than GCN. For that, we use DTI and GDI networksin which drugs/diseases are linked to associated proteins/genes. We use t-SNE [Maaten and Hinton, 2008] kipGNNGCN Figure 5:
Visualizations of drug-target interaction network.
GCN does not distinguish drug and targetgene as it only captures direct similarity whereas SkipGNN is able to distinct drug and target gene embeddings,confirming its ability to capture skip similarity . We use GCN and SkipGNN on the drug-target interactiondataset to learn drug/target embeddings, which are visualized using t-SNE.
GCN SkipGNN
Figure 6:
Visualizations of gene-disease interaction network.
GCN does not distinguish disease andgene as it only captures direct similarity whereas SkipGNN is able to distinct disease and gene embeddings,confirming its ability to capture skip similarity . We use GCN and SkipGNN on the gene-disease interactiondataset to learn gene/disease embeddings, which are visualized using t-SNE.13able 4:
Results of ablation experiments.
SkipGNN ’s model components setup achieve the best result.Ablation study result of five independent runs on DDI, PPI and DTI tasks.Task Method PR-AUC ROC-AUCDTI SkipGNN . ± . . ± . . ± .
011 0 . ± . . ± .
011 0 . ± . . ± . . ± . -Hadamard 0 . ± .
116 0 . ± . . ± .
006 0 . ± . -fusion 0 . ± .
007 0 . ± . . ± .
005 0 . ± . . ± .
006 0 . ± . . ± .
054 0 . ± . . ± .
003 0 . ± . -fusion 0 . ± .
004 0 . ± . . ± .
002 0 . ± . . ± .
003 0 . ± . . ± .
025 0 . ± . . ± .
003 0 . ± . -fusion 0 . ± .
029 0 . ± . . ± .
002 0 . ± . . ± .
009 0 . ± . . ± .
041 0 . ± . and visualize the learned embeddings in Figure 5 (DTI network) and Figure 6 (GDI network).First, we observe that GCN cannot distinguish between different types of biomedical entities ( i.e. , drugsvs. proteins and disease vs. genes). In contrast, SkipGNN can successfully separate the entities, as evidencedby distinguishable groups of points of the same color in the t-SNE visualizations. This observation confirmsthat SkipGNN has a unique ability to capture the skip similarity whereas GCN cannot. This is because GCNforces embeddings of connected drug-protein/gene-disease pairs to be similar and thus it embeds those pairsclose together in the embedding space. However, by doing so, GCN conflates drugs with proteins and geneswith diseases. In contrast, SkipGNN generates a biologically meaningful embedding space in which drugsare distinguished from proteins (or, genes from diseases) while drugs are still positioned in the embeddingspace close to proteins to which they bind (or, in the case of GDI network, diseases are positioned close torelevant disease-associated genes).Further, we find that GCN and its graph convolutional variants cannot capture skip similarity becausethey aggregate neural messages only from direct ( i.e. , immediate) neighbors in the interaction network.SkipGNN solves this problem by passing and aggregating neural message from direct as well as in-directneighbors, thereby explicitly capturing skip similarity . To show that each component of SkipGNN has an important role in the final performance of SkipGNN,we conduct a series of ablation studies. SkipGNN has four key components, and we study how the methoperformance changes when we remove each of the components:
Novel predictions of drug-drug interactions.
Shown are top-10 predicted drug-drug interactionstogether with the relevant literature providing evidence for predictions.Rank Drug 1 Drug 2 Evidence for DDI1 Warfarin Clozapine Mukku et al. [2018]2 Warfarin Ivacaftor Robertson et al. [2015]3 Phenelzine Deferasirox4 Warfarin Paraldehyde DuPont [2000]5 Warfarin Cyclosporine Snyder [1988]6 Phenytoin Sipuleucel-T7 Warfarin Netupitant8 Phenelzine Suvorexant Merck [2014]9 Leuprolide Picosulfuric acid10 Deferasirox Bexarotene Ligand [1999] • -fusion removes SkipGNN’s fusion scheme and replaces it with a simple concatenation of nodeembeddings generated by GCN.• -skipGraph removes skip graph and degenerates to GCN.• -Weighted-L1 uses weighted-L1 gate in Eq. (2) as AGG ( A , B ) = | A − B | , where | · | is the absolute valueoperator.• -Hadamard replaces the summation gate with Hadamard operator ‘ ∗ ‘ in Eq. (2) such that AGG ( A , B ) = A ∗ B .Table 5.5 show results of deactivating each of these components, one at a time. We find that -fusion outperforms-skipGraph ( i.e. , GCN) by a large margin. This finding identifies skip graph as a key driver of performanceimprovement. Further, we find that our iterative fusion scheme is important, indicating that successfulmethods need to integrate both direct and skip similarity in interaction networks. Next, we see that weighted L gate has comparable or worse performance than the summation gate and Hadamard operator performs theworst, suggesting that SkipGNN’s summation gate is the best-performing aggregation function. Altogether,we conclude that all SkipGNN’s components are necessary for its strong performance. The main goal of link prediction on graphs is to find novel hits that do not exist in the dataset. We conduct aliterature search and find SkipGNN is able to discover novel hits. We select pairs that are not interactedin the original dataset but are flagged as interaction from our model. We then pick the top 10 confidentinteractions and feed them into literature database and see if there are evidence supporting our findings. Wefind promising result for the DDI task (Table 5.6). Out of the 10 top-ranked interaction pairs, we are able tofind 6 pairs that have literature evidence support.For example, for the interaction between Warfarin and Calozapine, Mukku et al. [2018] reports that“
Clozapine increase the concentrations of commonly used drugs in elderly like digoxin, heparin, phenytoinand
Warfarin by displacing them from plasma protein. This can lead to increase in respective adverse effectswith these medications.” Also, the manufacturer Novartis [1989] also reports that “
Clozapine may displace
Warfarin from plasma protein-binding sites. Increased levels of unbound
Warfarin could result and couldincrease the risk of hemorrhage.” Take another example between Warfarin and Ivacaftor, Robertson et al.[2015] conducts a DDI study and reports that “caution and appropriate monitoring are recommended whenconcomitant substrates of CYP2C9, CYP3A and/or P-gp are used during treatment with
Ivacaftor , particularly rugs with a narrow therapeutic index, such as Warfarin .” Finally, we provide the top 10 outputs for DTI,PPI, and GDI tasks in Appendix C.
We introduced SkipGNN, a novel graph neural network for predicting molecular interactions. The architec-ture of SkipGNN is motivated by a principle of connectivity, which we call skip similarity . Remarkably, wefound that skip similarity allows SkipGNN to much better capture structural and evolutionary forces thatgovern molecular interaction networks that what is possible with current graph neural networks. SkipGNNachieves superior and robust performance on a variety of key prediction tasks in interaction networks andperforms well even when networks are highly incomplete.There are several future directions. We focused here on networks in which all edges are of the same type.As SkipGNN is a general graph neural network, it would be interesting to adapt SkipGNN to heterogeneousnetworks, such as drug-gene-disease networks. Another fruitful direction would be to implement skipsimilarity in other types of biological networks.
References
Lenore Cowen, Trey Ideker, Benjamin J Raphael, and Roded Sharan. Network propagation: a universalamplifier of genetic associations.
Nature Reviews Genetics , 18(9):551, 2017.Marinka Zitnik, Francis Nguyen, Bo Wang, Jure Leskovec, Anna Goldenberg, and Michael M Hoffman.Machine learning for integrating data in biology and medicine: Principles, practice, and opportunities.
Information Fusion , 50:71–91, 2019a.Yunan Luo et al. A network integration approach for drug-target interaction prediction and computationaldrug repositioning from heterogeneous information.
Nature Communications , 2017a.Marinka Zitnik, Monica Agrawal, and Jure Leskovec. Modeling polypharmacy side effects with graphconvolutional networks.
Bioinformatics , 34(13):i457–i466, 2018a.Katja Luck et al. A reference map of the human binary protein interactome.
Nature , pages 1–7, 2020.Monica Agrawal, Marinka Zitnik, and Jure Leskovec. Large-scale analysis of disease pathways in the humaninteractome. In
PSB , pages 111–122, 2018.Chengwei Lei and Jianhua Ruan. A novel link prediction algorithm for reconstructing protein–proteininteraction networks by topological similarity.
Bioinformatics , 29(3):355–364, 2013.Zonghan Wu, Shirui Pan, Fengwen Chen, Guodong Long, Chengqi Zhang, and Philip S Yu. Acomprehensive survey on graph neural networks. arXiv preprint arXiv:1901.00596 , 2019.Thomas N. Kipf and Max Welling. Semi-supervised classification with graph convolutional networks. In
ICLR , 2017.Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua Bengio.Graph attention networks.
ICLR , 2018.Sami Abu-El-Haija et al. Mixhop: Higher-order graph convolution architectures via sparsifiedneighborhood mixing.
ICML , 2019. ichael Costanzo, Anastasia Baryshnikova, Jeremy Bellay, Yungil Kim, Eric D Spear, Carolyn S Sevier,Huiming Ding, Judice LY Koh, Kiana Toufighi, Sara Mostafavi, et al. The genetic landscape of a cell. Science , 327(5964):425–431, 2010.Michael Costanzo, Benjamin VanderSluis, Elizabeth N Koch, Anastasia Baryshnikova, Carles Pons, GuihongTan, Wen Wang, Matej Usaj, Julia Hanchard, Susan D Lee, et al. A global genetic interaction networkmaps a wiring diagram of cellular function.
Science , 353(6306):aaf1420, 2016.Marinka Zitnik, Marcus W Feldman, Jure Leskovec, et al. Evolution of resilience in protein interactomesacross the tree of life.
PNAS , 116(10):4426–4433, 2019b.István A Kovács et al. Network-based prediction of protein interactions.
Nature Communications , 10(1):1240,2019.Miller McPherson, Lynn Smith-Lovin, and James M Cook. Birds of a feather: Homophily in social networks.
Annual Review of Sociology , 27(1):415–444, 2001.Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. DeepWalk: Online learning of social representations. In
KDD , pages 701–710. ACM, 2014.Aditya Grover and Jure Leskovec. node2vec: Scalable feature learning for networks. In
KDD , pages 855–864,2016.Leonardo FR Ribeiro, Pedro HP Saverese, and Daniel R Figueiredo. struc2vec: Learning noderepresentations from structural identity. In
KDD , pages 385–394. ACM, 2017.Lei Tang and Huan Liu. Leveraging social media networks for classification.
Data Mining and KnowledgeDiscovery , 23(3):447–478, 2011.Thomas N Kipf and Max Welling. Variational graph auto-encoders.
NeuralIPS Workshop on Bayesian DeepLearning , 2016.Linyuan Lü and Tao Zhou. Link prediction in complex networks: A survey.
Physica A: Statistical Mechanicsand its Applications , 390(6):1150–1170, 2011.Jörg Menche et al. Uncovering disease-disease relationships through the incomplete interactome.
Science ,347(6224):1257601, 2015.Claudio Durán et al. Pioneering topological methods for network-based drug–target prediction byexploiting a brain-network self-organization theory.
Briefings in Bioinformatics , 19(6):1183–1202, 2018.Albert-László Barabási and Réka Albert. Emergence of scaling in random networks.
Science , 286(5439):509–512, 1999.Linyuan Lü, Ci-Hang Jin, and Tao Zhou. Similarity index based on local paths for link prediction of complexnetworks.
Physical Review E , 80(4):046122, 2009.Marinka Zitnik and Blaz Zupan. Data fusion by matrix factorization.
IEEE Transactions on Pattern Analysisand Machine Intelligence , 37(1):41–53, 2015.Bo Wang et al. Network enhancement as a general method to denoise weighted biological networks.
NatureCommunications , 9(1):1–8, 2018.Linchuan Xu, Jiannong Cao, Xiaokai Wei, and Philip Yu. Network embedding via coupled kernelizedmulti-dimensional array factorization.
IEEE TKDE , 2019. ian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, and Qiaozhu Mei. Line: Large-scale informationnetwork embedding. In WWW , pages 1067–1077, 2015.Will Hamilton, Zhitao Ying, and Jure Leskovec. Inductive representation learning on large graphs. In
NeurIPS , pages 1024–1034, 2017.Tengfei Ma, Cao Xiao, Jiayu Zhou, and Fei Wang. Drug similarity integration through attentive multi-viewgraph auto-encoders.
IJCAI , 2018.Muhan Zhang and Yixin Chen. Link prediction based on graph neural networks. In
NeurIPS , pages 5165–5175,2018.Masashi Tsubaki, Kentaro Tomii, and Jun Sese. Compound–protein interaction prediction with end-to-endlearning of neural networks for graphs and sequences.
Bioinformatics , 35(2):309–318, 2019.Hakime Öztürk, Arzucan Özgür, and Elif Ozkirimli. Deepdta: deep drug–target binding affinity prediction.
Bioinformatics , 34(17):i821–i829, 2018.Yingkai Gao, Achille Fokoue, et al. Interpretable drug target prediction using deep neural representation. In
IJCAI , pages 3371–3377, 2018.Kexin Huang, Cao Xiao, Trong Nghia Hoang, Lucas M Glass, and Jimeng Sun. Caster: Predicting druginteractions with chemical substructure representation.
AAAI , 2020.Jae Yong Ryu, Hyun Uk Kim, and Sang Yup Lee. Deep learning improves prediction of drug–drug anddrug–food interactions.
PNAS , 115(18):E4304–E4311, 2018.Feixiong Cheng and Zhongming Zhao. Machine learning-based prediction of drug–drug interactions byintegrating drug phenotypic, therapeutic, chemical, and genomic properties.
JAMIA , 21(e2):e278–e286, 2014.Tijana Milenković and Nataša Pržulj. Uncovering biological network function via graphlet degreesignatures.
Cancer Informatics , 6:CIN–S680, 2008.Wen Zhang, Xiang Yue, Weiran Lin, Wenjian Wu, Ruoqi Liu, Feng Huang, and Feng Liu. Predictingdrug-disease associations by using similarity constrained matrix factorization.
BMC Bioinformatics , 19(1):1–12, 2018.Reza Ferdousi, Reza Safdari, and Yadollah Omidi. Computational prediction of drug-drug interactionsbased on drugs functional similarities.
JBI , 70:54–64, 2017.Ping Zhang, Fei Wang, Jianying Hu, and Robert Sorrentino. Label propagation prediction of drug-druginteractions based on clinical side effects.
Scientific Reports , 5(1):1–10, 2015.Marinka Zitnik and Jure Leskovec. Predicting multicellular function through multi-layer tissue networks.
Bioinformatics , 33(14):i190–i198, 2017.Yunan Luo, Xinbin Zhao, Jingtian Zhou, Jinglin Yang, Yanqing Zhang, Wenhua Kuang, Jian Peng, LigongChen, and Jianyang Zeng. A network integration approach for drug-target interaction prediction andcomputational drug repositioning from heterogeneous information.
Nature Communications , 8(1):1–13,2017b.Wenming Cao, Zhiyue Yan, Zhiquan He, and Zhihai He. A comprehensive survey on geometric deeplearning.
IEEE Access , 8:35929–35949, 2020. arinka Zitnik, Rok Sosič, Sagar Maheshwari, and Jure Leskovec. BioSNAP Datasets: Stanford biomedicalnetwork dataset collection, August 2018b.Katja Luck et al. A reference map of the human protein interactome. bioRxiv , 2019.Janet Piñero et al. The DisGeNET knowledge platform for disease genomics: 2019 update. Nucleic AcidsResearch , 11 2019. ISSN 0305-1048. doi: 10.1093/nar/gkz1021. URL https://doi.org/10.1093/nar/gkz1021 . gkz1021.Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization.
ICLR , 2014.Fan RK Chung and Fan Chung Graham.
Spectral graph theory . Number 92. American Mathematical Soc., 1997.Laurens van der Maaten and Geoffrey Hinton. Visualizing data using t-sne.
JMLR , 9(Nov), 2008.Shiva Shanker Reddy Mukku, PT Sivakumar, and Mathew Varghese. Clozapine use in geriatricpatients—challenges.
Asian Journal of Psychiatry , 33:63–67, 2018.Sarah M Robertson et al. Clinical drug-drug interaction assessment of ivacaftor as a potential inhibitor ofcytochrome p450 and p-glycoprotein.
The Journal of Clinical Pharmacology , 55(1):56–62, 2015.Pharmaceuticals DuPont. Product information. coumadin (warfarin).
DuPont Pharmaceuticals, Wilmington,DE. , 2000. URL .David S. Snyder. Interaction between Cyclosporine and Warfarin.
Annals of Internal Medicine , 108(2):311–311,02 1988. ISSN 0003-4819. doi: 10.7326/0003-4819-108-2-311_1. URL https://doi.org/10.7326/0003-4819-108-2-311_1 .& Company Inc Merck. Product information. belsomra (suvorexant).
Merck & Company Inc, WhitehouseStation, NJ , 2014. URL .Pharmaceuticals Ligand. Product information. targretin (bexarotene).
Ligand Pharmaceuticals, San Diego,CA. , 1999. URL .Pharmaceuticals Novartis. Product information. clozaril (clozapine).
Novartis Pharmaceuticals, East Hanover,NJ. , 1989. URL . Experiments on the importance of each layer of GNN forbiomedical link prediction
To further support our claim on the importance of integrating skip similarity for GNN-based methodson biomedical interaction network link prediction, we vary the architecture of vanilla GNN and performpredictive comparison on DDI, PPI, and DTI tasks. Here are the variations:•
TwoLayers-OriGraph is the two layers GCN on original graph. It uses an indirect two-hops neigh-borhood aggregation because the two-hops nodes information is conveyed to the center node throughthe one-hop nodes.•
OneLayer-OriGraph is a one layer vanilla GCN. It only utilizes the immediate one-hop neighborinformation. Hence, it is a direct measure of direct similarity .• TwoLayers-SkipGraph is the vanilla two layers GCN that operates on the skip graph. It uses directconnection of center node with its two-hops neighborhood as against the indirect connection in vanillaGCN. As it is two layer, it also considers indirect four-hops neighbor nodes.•
OneLayer-SkipGraph is the one layer version of GCN-A2. As it only uses two-hop neighbor infor-mation, it directly measures the skip similarity .Table A compares the results. From the large improvement of TwoLayers-OriGraph over OneLayer-OriGraph, this is the initial evidence that two-hops neighborhood, which contains skip similarity noderelation assumption, is essential. Then, comparing OneLayer-OriGraph and OneLayer-SkipGraph, thelarge margin improvement of OneLayer-SkipGraph implies two-hops neighbor alone has more predictiveinformation than one-hop neighbor alone, supporting our motivation analysis of the importance of skipsimilarity for biomedical interaction network. Note also that the improvement from OneLayer-OriGraphto TwoLayers-OriGraph is much larger than the improvement from OneLayer-SkipGraph to TwoLayers-SkipGraph, meaning second-hop is essential and higher-order neighborhood is of limited importance forinteraction link prediction. Lastly, TwoLayers-OriGraph performs better than TwoLayers-SkipGraph,meaning that biomedical interaction link prediction is a balance between immediate neighbor and two-hopsneighbor, confirming with our intuition that an ideal network should pursue a balance between them andadding support for the iterative fusion scheme.
B Details about baseline methods • L3 [Kovács et al., 2019] counts the length-3 paths among all the network nodes pairs. The number oflength-3 paths are then normalized by the degree of node pairs.• DeepWalk [Perozzi et al., 2014] performs uniform distributed random walk and applies skip-grammodel to learn a node embedding. We use 20 walk lengths and then concatenate the target nodesembedding with a logistic regression classifier.• node2vec [Grover and Leskovec, 2016] builds on DeepWalk and uses biased random walk based ondepth/breath first search to consider both local and global network structure. We use 20 walk lengthas the paper suggests longer walk lengths improve the embedding quality. The paper also reportedHadamard product perform better than average and weighted L1/L2 for link prediction. However, inour experiment, the simple concatenation is better than Hadamard. After the concatenation, we feedinto a logistic regression classifier as described in the paper.• struc2vec [Ribeiro et al., 2017] leverages the local network structure in addition to the node2vec. Weuse 80 walk length and 20 number of walks, following author’s recommendation. We then concatenatethe latent embedding and feed into a logistic regression classifier.
Skip Similarity is important for biomedical interaction prediction when using GCN.
Resultsof five independent runs on DDI, PPI and DTI tasks with varying architectures of GCN.Task Method PR-AUC ROC-AUCDTI TwoLayers-OriGraph 0 . ± .
011 0 . ± . . ± .
024 0 . ± . . ± .
041 0 . ± . . ± .
047 0 . ± . . ± .
005 0 . ± . . ± .
029 0 . ± . . ± .
003 0 . ± . . ± .
008 0 . ± . . ± .
002 0 . ± . . ± .
013 0 . ± . . ± .
003 0 . ± . . ± .
023 0 . ± . . ± .
002 0 . ± . . ± .
043 0 . ± . . ± .
008 0 . ± . . ± .
016 0 . ± . • Spectral Clustering [Tang and Liu, 2011] projects nodes on top-16 eigenvectors of the normalizedLaplacian matrix and uses the transposed eigenvectors as node embeddings. The embeddings are thenmultiplied and pass through a sigmoid function to obtain link probabilities.•
VGAE [Kipf and Welling, 2016] applies variational graph auto-encoder and learns node embeddings thatbest reconstruct the adjacent matrix. We use a two-layer GCN with hidden size 64 for layer one and 16for layer two. The learning rate is set to be 5e-4 with Adam optimizer for 300 epochs. The dropout rateis set to be 0.1.•
GCN [Kipf and Welling, 2017] uses two-layers GCN layers on original adjacency matrix to obtain nodeembeddings, others are with same setting as SkipGNN . We use a two-layer GCN with hidden size 64for layer one and 16 for layer two. The learning rate is set to be 5e-4 with Adam optimizer for 10 epochswith batch size 256. SkipGNN usually converges within 5 epochs. The dropout rate is set to be 0.1.We determine all parameters for the baseline methods using the grid search on a validation set.
C Potential novel hits for PPI, DTI, and GDI
We conducted a literature search for the DDI novel hits in the main text. Here, we also provide the novelhits discovered through SkipGNN for the PPI, DTI, and GDI tasks in Table C.