A Literature Review of Recent Graph Embedding Techniques for Biomedical Data
AA Literature Review of Recent GraphEmbedding Techniques for Biomedical Data
Yankai Chen , Yaozu Wu , Shicheng Ma , and Irwin King Department of Computer Science and Engineering,The Chinese University of Hong Kong { ykchen,king } @cse.cuhk.edu.hk KEEP, The Chinese University of Hong Kong [email protected], [email protected]
Abstract.
With the rapid development of biomedical software and hard-ware, a large amount of relational data interlinking genes, proteins, chem-ical components, drugs, diseases, and symptoms has been collected formodern biomedical research. Many graph-based learning methods havebeen proposed to analyze such type of data, giving a deeper insight intothe topology and knowledge behind the biomedical data, which greatlybenefit to both academic research and industrial application for humanhealthcare. However, the main difficulty is how to handle high dimension-ality and sparsity of the biomedical graphs. Recently, graph embeddingmethods provide an effective and efficient way to address the above is-sues. It converts graph-based data into a low dimensional vector spacewhere the graph structural properties and knowledge information are wellpreserved. In this survey, we conduct a literature review of recent devel-opments and trends in applying graph embedding methods for biomedicaldata. We also introduce important applications and tasks in the biomed-ical domain as well as associated public biomedical datasets.
Keywords:
Graph embedding · Biomedical data · Biomedical graph · Biomedical informatics · Graph embedding survey.
With the recent advances in biomedical technology, a large number of relationaldata interlinking biomedical components including proteins, drugs, diseases, andsymptoms, etc. has gained much attention in biomedical academic research. Rela-tional data, also known as the graph, which captures the interactions (i.e., edges)between entities (i.e., nodes), now plays a key role in the modern machine learn-ing domain. Analyzing these graphs provides users a deeper understanding oftopology information and knowledge behind these graphs, and thus greatly ben-efits many biomedical applications such as biological graph analysis [2], networkmedicine [4], clinical phenotyping and diagnosis [40], etc.As summarized in Figure 1, although graph analytics is of great importance,most existing graph analytics methods suffer the computational cost drawn byhigh dimensionality and sparsity of the graphs [12,7,36]. Furthermore, owing tothe heterogeneity of biomedical graphs, i.e., containing multiple types of nodes a r X i v : . [ c s . A I] J a n Yankai Chen, Yaozu Wu, Shicheng Ma, and Irwin King
Biomedical graphs (1) Biomedical relational data(2) Electronic medical data(3) Biomedical knowledge graphs
Graph embedding (1) Low dimensional representation(2) Graph information well preserved (3) Inputs for downstream tasks
Downstream tasks (1) Node classification (2) Link prediction (3) Clustering
Traditional graph analysis (1) High dimensionality and sparsity(2) High heterogeneity(3) High computational cost
Fig. 1.
Comparison between traditional graph analysis methods and graph embeddingtechniques for biomedical graphs. and edges, traditional analyses over biomedical graphs remain challenging. Re-cently, graph embedding methods, aiming at learning a mapping that embedsnodes into a low dimensional vector space R d , now provide an effective and ef-ficient way to address the problems. Specifically, the goal is to optimize thismapping so that the node representation in the embedding space can well pre-serve information and properties of the original graphs. After optimization ofsuch representation learning, the learned embedding can then be used as fea-ture inputs for many machine learning downstream tasks, which hence intro-duces enormous opportunities for biomedical data science. Efforts of applyinggraph embedding over biomedical data are recently made but still not thor-oughly explored; capabilities of graph embedding for biomedical data are alsonot extensively evaluated. In addition, the biomedical graphs are usually sparse,incomplete, and heterogeneous, making graph embedding more complicated thanother application domains. To address these issues, it is strongly motivated tounderstand and compare the state-of-the-art graph embedding techniques, andfurther study how these techniques can be adapted and applied to biomedicaldata science. Thus in this survey, we investigate recent developments and trendsof graph embedding techniques for biomedical data, which give us better insightsinto future directions. In this article, we introduce the general models related tobiomedical data and omit the complete technical details. For a more comprehen-sive overview of graph embedding techniques and applications, we refer readersto previous well-summarized papers [7,19,43,9].In this article, we first give the preliminaries used in this paper. We thenbriefly introduce the widely used graph embedding models. After that, we intro-duce some related public biomedical datasets. Finally, we carefully discuss therecent developments and trends of biomedical graph embedding applications. Definition 1 (Homogeneous graphs).
A homogeneous graph G = ( V, E ) isassociated with two mapping functions Φ : V (node set) → A (node type set) and Ψ : E (edge set) → R (edge type set) and |A| = |R| = 1 . Definition 2 (Heterogeneous graphs).
A heterogeneous graph G = ( V, E ) is associated with a node type mapping function Φ : V → A and an edge typemapping function Ψ : E → R and |A| > and/or |R| > . itle Suppressed Due to Excessive Length 3 Definition 3 (Dynamic graphs).
A graph G = ( V, E ) is a dynamic graphwhere V = ( v, t s , t e ) with t s , t e are respectively the start and end timestamps forthe vertex existence (with t s ≤ t e ); E = ( u, v, t s , t e ) with u, v ∈ V and t s , t e arerespectively the start and end timestamps for the edge existence (with t s ≤ t e ).Problem 1 ( Graph embedding ). Given a graph G = ( V, E ), and a predefinedembedding dimensionality d where d (cid:28) | V | . Graph embedding aims to convert G into a d -dimensional space R d , where the information and proprieties of G arewell preserved as much as possible.In the following section, we provide the taxonomy of graph embedding meth-ods based on the graph settings and embedding techniques, respectively. As shown in Figure 2, in this section, according to the graph settings, we in-troduce homogeneous graph embedding models , heterogeneous graph embeddingmodels and dynamic graph embedding models as follows. Graph embedding Dynamic graphsHomogeneous graphs Heterogeneous graphs
Matrix factorization-based methodsRandom walk-based methodsDeep learning-based methods Translational distance methodsSemantic matching methodsOther methodsMeta-path-based methods Probabilistic methodsDynamic graph embedding methods
Fig. 2.
Taxonomy of graph embedding models.
In the literature, there are three main types of homogeneous graph embeddingmethods, i.e., matrix factorization-based methods , random walk-based methods and deep learning-based methods . Matrix factorization-based methods.
Matrix factorization-based methods,inspired by classic techniques for dimensionality reduction, use the form of amatrix to represent the graph properties, e.g., node pairwise similarity. Generally,there are two types of matrix factorization to compute the node embedding, i.e., node proximity matrix and graph Laplacian eigenmaps .For node proximity matrix factorization methods, they usually approximatenode proximity into a low dimension and the objective of preserving node prox-imity is to minimize the approximation loss || W − U X T || , where W is the nodeproximity matrix, X is the embedding for context nodes and embedding U canbe computed using this loss function. Actually, there are many other solutions toapproximate this loss function, such as low rank matrix factorization, regularizedGaussian matrix factorization, etc. For graph Laplacian eigenmaps factorizationmethods, the assumption is that the graph property can be interpreted as thesimilarity of pairwise nodes. Thus, to obtain a good representation, the normal Yankai Chen, Yaozu Wu, Shicheng Ma, and Irwin King operation is that a larger penalty will be given if two nodes with higher similarityare far embedded. The optimal embedding U ∗ can be computed by using theobjective function (1): U ∗ = arg min U T DU =1 U T LU = arg min U T LUU T DU = arg max U T W UU T DU , (1)where L = D − W is the graph Laplacian. D is the diagonal matrix and D ii = (cid:80) j W ji . There are many works using graph Laplacian-based methodsand they mainly differ from how they calculate the pairwise node similarity W ij .For example, BANE [55] defines a new Weisfeiler-Lehman proximity matrix tocapture data dependence between edges and attributes; then based on this ma-trix, BANE learns the node embeddings by formulating a new Weisfiler-Lehmanmatrix factorization. Recently, NetMF [37] unifies state-of-the-art approachesinto a matrix factorization framework with close forms. Random walk-based methods.
Random walk-based methods have been widelyused to approximate many properties in the graph including node centrality andsimilarity. They are more useful when the graph can only partially be observed,or the graph is too large to measure. Two widely recognized random walk-basedmethods have been proposed, i.e., DeepWalk [36] and node2vec [20]. Concretely,DeepWalk considers the paths as sentences and implements an NLP model tolearn node embeddings. Compared to DeepWalk, node2vec introduces a trade-offstrategy using breadth-first and depth-first search to perform a biased randomwalk. In recent year, there are still many random walk-based papers workingon improving performance. For example, AWE [24] uses a recently developedmethod called anonymous walks , i.e., an anonymized version of the random walk-based method providing characteristic graph traits and are capable to exactlyreconstruct network proximity of a node. AttentionWalk [1] uses the softmax tolearn a free-form context distribution in a random walk; then the learned atten-tion parameters guide the random walk, by allowing it to focus more on shortor long term dependencies when optimizing an upstream objective. BiNE [18]proposes methods for bipartite graph embedding by performing biased randomwalks. Then they generate vertex sequences that can well preserve the long-taildistribution of vertices in original bipartite graphs.
Deep learning-based methods.
Deep learning has shown outstanding per-formance in a wide variety of research fields. SDNE [47] applies a deep autoen-coder to model non-linearity in the graph structure. DNGR [8] learns deep low-dimensional vertex representations, by using the stacked denoising autoencoderson the high-dimensional matrix representations. Furthermore, Graph Convolu-tional Network (GCN) [27] introduces a well-behaved layer-wise propagation rulefor the neural network model which operates directly on graphs in Equation (2): H l +1 = σ ( ˆ D − ˆ A ˆ D − H l W l ) , (2)with ˆ A = A + I , where A and I is the adjacency and identity matrix andˆ D is the diagonal degree matrix of ˆ A . w l is a weight matrix for the l -th neuralnetwork layer and σ ( · ) is a non-linear activation function like the ReLU . H l and H l +1 are the input and output for layer l and layer l + 1, respectively. Another itle Suppressed Due to Excessive Length 5 important work is Graph Attention Network (GAT) [46], which leverages maskedself-attentional layers to address the shortcomings of prior graph convolution-based methods. Specifically, as shown in Equation (3): α ij = exp( e ij ) (cid:80) k ∈N i exp( e ik ) , where e ij = a ( (cid:126)h i ,(cid:126)h j ) . (3) N i is the neighbors of node i . GAT computes norm i alized coefficients α ij us-ing the softmax function across different neighborhoods by a byproduct of anattentional mechanism across node pairs. To stabilize the learning process ofself-attention, GAT uses multi-head attention to replicate K times of learningphases, and outputs are feature-wise aggregated (typically by concatenating oradding), as shown in Equation (4): h i = || Kk =1 σ ( (cid:88) j ∈N j α ij W k h j ) , (4)where α kij and W k are the attention coefficients and the weight matrix speci-fying the linear transformation of the k -th replica. Recently, HGCN [11] andATTH [10] use hyperbolic model to embed hierarchical graph structure withless distortion. The heterogeneity in both graph structures and node attributes makes it chal-lenging for the graph embedding task to encode their diverse and rich infor-mation. In this section, we will introduce translational distance methods and semantic matching methods , which try to address the above issue by construct-ing different energy functions. Furthermore, we will introduce meta-path-basedmethods that use different strategies to capture graph heterogeneity.
Translational distance methods.
The first work of translation distance mod-els is TransE [6]. The basic idea of the translational distance models is, for eachobserved fact ( h, r, t ) representing head entity h having a relation r with tailentity t , to learn a good graph representation such that h and t are closelyconnected by relation r in low dimensional embedding space, i.e., h + r ≈ t using geometric notations. Here h , r and t are embedding vectors for entities h , t and relation r , respectively. The energy function of TransE is defined as f r ( h, t ) = || h + r − t || . The margin-based objective funtion of TransE is shownin Equation (5): L = (cid:88) ( h,r,t ) ∈ S (cid:88) ( h (cid:48) ,r,t (cid:48) ) ∈ S (cid:48) max(0 , f r ( h, t ) − f r ( h (cid:48) , t (cid:48) ) + margin ) , (5)where S denotes the set containing the true facts, e.g., ( h, r, t ), and S (cid:48) is the set offalse triplets, e.g., ( h (cid:48) , r, t (cid:48) ), that are not observed in the knowledge graphs. Pleasenote that the energy function f r here can be viewed as the distance score of theembedding of entities h and t in terms of relation r . To further improve TransEmodel and address its inadequacies, many recent works have been developed. Forexample, RotatE [44] defines each relation as a rotation from the source entity Yankai Chen, Yaozu Wu, Shicheng Ma, and Irwin King to the target entity in the complex vector space. QuatE [56] computes nodeembedding vectors in the hypercomplex space with three imaginary components,as opposed to the standard complex space with a single real component andimaginary component. MuRP [3] is a hyperbolic embedding method that embedsmulti-relational data in the Poincar´e ball model of hyperbolic space, which canwell perform in hierarchical and scale-free graphs.
Semantic matching methods.
Semantic matching models exploit similarity-based scoring functions. They measure plausibility of facts by matching latentsemantics of entities and relations embodied in their representations. Targettingthe observed fact ( h, r, t ), RESCAL [34] embeds each entity with a vector tocapture its latent semantics and each relation with a matrix to model pairwiseinteractions between latent factors. Equation (6) defines the energy function: f r ( h, t ) = h T M r t = d − (cid:88) i =0 d − (cid:88) j =0 [ M r ] ij · h i · t j , (6)where M r is a matrix associated with the relation. HolE [33] deals with di-rected graphs and composes head entity and tail entity by their circular cor-relation, which achieves a better performance than RESCAL. There are otherworks trying to extend or simplify RESCAL, e.g., DistMult [54], ComplEx [45],ANALOGY [30]. Other direction of semantic matching methods is to fuse neu-ral network architecture by considering embedding as the input layer and energyfunction as the output layer. For instance, SME model [5] first imputs embed-dings of entities and relations in the input layer. The relation r is then combinedwith the head entity h to get g left ( h, r ) = M h + M r + b h , and with the tailentity t to get g right ( t, r ) = M t + M r + b t in the hidden layer. The score func-tion is defined as f r ( h, t ) = g left ( h, r ) T · g right ( t, r ). There are other semanticmatching methods using neural network architecture, e.g., NTN [42], MLP [15]. Meta-path-based methods.
Generally, a meta-path is an ordered path thatconsists of node types and connects via edge types defined on the graph schema,e.g., A R −→ A · · · R l − −→ A l , which describes a composite relation between nodetypes A , A , · · · , A l and edge types R , · · · , R l − . Thus, meta-paths can beviewed as high-order proximity between two nodes with specific semantics. Aset of recent works have been proposed. Metapath2vec [16] computes node em-beddings by feeding metapath-guided random walks to a skip-gram[32] model.HAN [51] learns meta-path-oriented node embeddings from different meta-path-based graphs converted from the original heterogeneous graph and leverages theattention mechanism to combine them into one vector representation for eachnode. HERec [39] learns node embeddings by applying DeepWalk [36] to themeta-path-based homogeneous graphs for recommendation. MAGNN [17] com-prehensively considers three main components to achieve the state-of-the-artperformance. Concretely, MAGNN [17] fuses the node content transformationto encapsulate node attributes, the intra-metapath aggregation to incorporateintermediate semantic nodes, and the inter-metapath aggregation to combinemessages from multiple metapaths. Other methods.
LANE [23] constructs proximity matrices by incorporatinglabel information, graph topology, and learns embeddings while preserving their itle Suppressed Due to Excessive Length 7 correlations based on Laplacian matrix. EOE [53] aims to embed the graphcoupled by two non-attribute graphs. In EOE, latent features encode not onlyintra-network edges, but also inter-network ones. To tackle the challenge of het-erogeneity of two graphs, the EOE incorporates a harmonious embedding matrixto further embed the embeddings. Inspired by generative adversarial networkmodels, HeGAN [21] is designed to be relation-aware in order to capture therich semantics on heterogeneous graphs and further trains a discriminator anda generator in a minimax game to generate robust graph embeddings.
In practice, graphs are always evolving over time. Recently, much attention ispaid to graph embedding for dynamic graphs. In this section, we will brieflyintroduce some typical general models as follows.
Probabilistic models.
In generative probabilistic models,
Dynamic latent spacemodels and
Dynamic stochastic block models are two main types within. Latentspace models model every node with an unobserved feature vector. An edgebetween two nodes is then formed conditionally independent of all other pairsof nodes. The latent features are changed over time. Such models are flexibleand require fitting of parameters with Markov chain Monte Carlo methods thatscale up to only a few hundred nodes [25]. Stochastic block models divide nodesinto blocks (classes), where nodes within a block are assumed to have identicalstatistical properties. An edge between two nodes is formed independently of allother pairs of nodes with a probability dependents only on the blocks of the twonodes, giving the adjacency matrix of blocks corresponding to pairs of blocks.
Dynamic graph embedding methods.
In dynamic graph embedding meth-ods, there are mainly three types of methods, i.e., tensor decomposition-basedmethods , random walk-based methods , deep learning-based methods , which areactually inspired from those for homogeneous graphs. Tensor decomposition isanalogous to matrix factorization where the additional dimension is time. Asfor random walk-based methods for dynamic graphs, they are generally exten-sions of random walk-based embedding methods for static graphs or they applytemporal random walks. Furthermore, deep learning models for dynamic graphsmainly contain two types of models: temporal restricted Boltzmann machinesand dynamic graph neural networks. For detailed analysis, please refer to thesurvey over dynamic graph embedding in [41,26]. We first summarize some commonly used biomedical datasets in Table 1, wherethe columns are: average number of nodes/edges, dimensionality of node features,number of node classes, and graphs, respectively.PubMed-diabetes is a citation graph consists of scientific publications andcitations pertaining to diabetes. PPI contains 24 graphs including protein- https://linqs.soe.ucsc.edu/data http://snap.stanford.edu/graphsage/ppi.zip Yankai Chen, Yaozu Wu, Shicheng Ma, and Irwin King Table 1.
Datasets Statistics
Dataset avg. | V | avg. | E | Features Classes Graphs Graph Type
PubMed-diabetes 19,717.00 44,338.00 500 3 1 Citation GraphPPI 2,372.67 34,113.17 50 121 24 Bio-chemical GraphMUTAG 17.93 19.79 7 2 188 Bio-chemical GraphNCI-1 29.87 32.30 37 2 4,110 Bio-chemical GraphNCI-33 30.20 - 29 - 2,843 Bio-chemical GraphNCI-83 29.50 - 28 - 3,867 Bio-chemical GraphNCI-109 29.60 - 38 - 4,127 Bio-chemical GraphDD 284.31 715.65 82 2 1,178 Bio-chemical GraphPROTEINS 39.06 72.81 4 2 1,113 Bio-chemical GraphENZYMES 32.46 63.14 6 6 600 Biological Graph protein interactions of different organisms such as Homo sapiens, Mus mus-culus, etc. MUTAG dataset contains nitro compounds which are divided intotwo classes according to their mutagenic effect on a bacterium. NCI- {
1, 33, 83,109 } [35] contains chemical compounds which are screened for activity againstnon-small cell cancer of lung, melanoma, breast and ovarian, respectively. DD and PROTEINS are two datasets that represent proteins as graphs which labelsare enzymes and non-enzymes and ENZYMES[52] is a biological dataset. In recent years, graph embedding methods have been applied in biomedical datascience. In this section, we will introduce some main biomedical applications ofapplying graph embedding techniques, including pharmaceutical data analysis , multi-omics data analysis and clinical data analysis . Pharmaceutical data analysis.
Generally, there are two main types of ap-plications for pharmaceutical data analysis, i.e., (i) drug repositioning and (ii) adverse drug reaction analysis .(i) Drug repositioning usually aims to predict unknown drug-target or drug-disease interactions. Recently, DTINet [31] generates drug and target-proteinembedding by separately performing random walk with restart on heterogeneousbiomedical graphs. Then DTINet projects drugs into the embedding space of tar-get proteins and made predictions based on geometric proximity. Other studiesover drug repositioning focused on predicting drug disease associations. For in-stance, Dai et al. [14] first embed genes by applying eigenvalue decomposition toa gene-gene interaction graph and calculated genomic representations for drugsand diseases from the gene embedding vectors. Wang et al. [49] proposed todetect unknown drug-disease interactions from the medical literature by fusingNLP and graph embedding techniques. (ii) An adverse drug reaction (ADR) isdefined as any undesirable drug effect out of its desired therapeutic effects thatoccur at a usual dosage, which now is the center of drug development before adrug is launched on the clinical trial.
Multi-omics data analysis.
The main aim of multi-omics is to study struc-tures, functions, and dynamics of organism molecules. Fortunately, graph embed-ding now becomes a valuable tool to analyze relational data in omics. Concretely, https://chrsmrrs.github.io/datasets/docs/datasets/itle Suppressed Due to Excessive Length 9 the computation tasks included in multi-omics data analysis are mainly about(i) genomics , (ii) proteomics and (iii) transcriptomics .(i) Works of graph embedding used in genomics data analysis usually tryto decipher biology from genome sequences and related data. For example,based on gene-gene interaction data, a recent work [29] extends the graph em-bedding method, i.e., LINE, over two bipartite graphs, Cell-ContexGene andGene-ContexGene networks, and then proposes SCRL to address representa-tion learning for single cell RNA-seq data, which outperforms traditional di-mensional reduction methods according to the experimental results. (ii) As wehave introduced before, PPIs play key roles in most cell functions. Graph em-bedding has also been introduced to PPI graphs for proteomics data analysis,such as assessing and predicting PPIs or predicting protein functions, etc. Re-cently, ProSNet [50] has been proposed for protein function prediction. In thismodel, they introducing DCA to a heterogeneous molecular graph and furtheruse the meta-path-based methods to modify DCA for preserving heterogeneousstructural information. Thanks to the proposed embedding methods for suchheterogeneous graphs, their experimental prediction performance was greatlyimproved. (iii) As for transcriptomics study, the focus is to analyze an organ-ism’s transcriptome. For instance, Identifying miRNA-disease associations nowbecomes an important topic of pathogenicity; while graph embedding now pro-vides a useful tool to involve in transcriptomics for prediction of miRNA-diseaseassociations. To predict new associations, CMFMDA [38] introduces matrix fac-torization methods to the bipartite miRNA-disease graph for graph embedding.Besides, Li et al. [28] proposed a method by using DeepWalk to embed the bi-partite miRNA-disease network. Their experimental results demonstrate that,by preserving both local and global graph topology, DeepWalk can result insignificant improvements in association prediction for miRNA-disease graphs. Clinical data analysis.
Graph embedding techniques have been applied toclinic data, such as electronic medical records (EMRs), electronic health records(EHRs) and medical knowledge graph, providing useful assistance and supportfor clinicians in recent clinic development.EMRs and EHRs are heterogeneous graphs that comprehensively includemedical and clinical information from patients, which provide opportunities forgraph embedding techniques to make medical research and clinical decision. Toaddress the heterogeneity of EMRs and EHRs data, GRAM [13] learns EHRrepresentation with the help of hierarchical information inherent to medical on-tologies. ProSNet [22] constructs a biomedical knowledge graph to learn theembeddings of medical entities. The proposed method is used to visualize theParkinson’s disease data set. Conducting medical knowledge graph is of greatimportance and attention recently. For instance, analogous to TransE, Zhao etal. [58] defined energy function by considering the relation between the symptomsof patients and diseases as a translation vector to further learn the representa-tion of medical forum data. Then a new method is proposed to learn embeddingsof medical entities in the medical knowledge graph, based on the energy func-tions of RESCAL and TransE [57]. In addition, Wang et al. [48] constructedobjective function by using both the energy function of TransR and LINE’s 2nd- order proximity measurement to learn embeddings from a heterogeneous medicalknowledge graph to further recommend proper medicine to patients.
Graph embedding methods aim to learn compact and informative representa-tions for graph analysis and thus provide a powerful opportunity to solve the tra-ditional graph-based machine learning problems both effectively and efficiently.With the rapid development of relational data in the biomedical data domain,applying graph embedding techniques now draws much attention in numerousbiomedical applications. However, as we have reviewed in this survey, the capabil-ity of graph embedding for biomedical graph analysis has not been fully explored.There may exist many issues associated with the biomedical data that may bringchallenges to biomedical graph embedding tasks. For example, biomedical dataquality could be not well structured; knowledge and information from biomedicaldomain or health care records could be complicated, compared to the generaldomain. In this survey, we introduce recent developments and trends of differentgraph embedding methods. By carefully summarizing biomedical applicationswith graph embedding methods, we provide more perspectives over this emerg-ing research domain for better improvement in human health care.
References
1. Abu-El-Haija, S., Perozzi, B., Al-Rfou, R., Alemi, A.A.: Watch your step: Learningnode embeddings via graph attention. In: NeurIPS. pp. 9180–9190 (2018)2. Albert, R.: Scale-free networks in cell biology. Journal of cell science (2005)3. Balazevic, I., Allen, C., Hospedales, T.: Multi-relational poincar´e graph embed-dings. In: NeurIPS. pp. 4465–4475 (2019)4. Barab´asi, A.L., Gulbahce, N., Loscalzo, J.: Network medicine: a network-basedapproach to human disease. Nature reviews genetics (1), 56–68 (2011)5. Bordes, A., Glorot, X., Weston, J., Bengio, Y.: A semantic matching energy func-tion for learning with multi-relational data. ML (2), 233–259 (2014)6. Bordes, A., Usunier, N., Garcia-Duran, A., Weston, J., Yakhnenko, O.: Translatingembeddings for modeling multi-relational data. In: NeurIPS. pp. 2787–2795 (2013)7. Cai, H., Zheng, V.W., Chang, K.C.C.: A comprehensive survey of graph embed-ding: Problems, techniques, and applications. TKDE (9), 1616–1637 (2018)8. Cao, S., Lu, W., Xu, Q.: Deep neural networks for learning graph representations.In: AAAI (2016)9. Chami, I., Abu-El-Haija, S., Perozzi, B., R´e, C., Murphy, K.: Machine learning ongraphs: A model and comprehensive taxonomy. arXiv preprint arXiv:2005.03675(2020)10. Chami, I., Wolf, A., Juan, D.C., Sala, F., Ravi, S., R´e, C.: Low-dimensional hy-perbolic knowledge graph embeddings. ACL (2020)11. Chami, I., Ying, Z., R´e, C., Leskovec, J.: Hyperbolic graph convolutional neuralnetworks. In: NeurIPS. pp. 4868–4879 (2019)12. Chen, Y., Zhang, J., Fang, Y., Cao, X., King, I.: Efficient community search overlarge directed graph: An augmented index-based approach. In: IJCAI. pp. 3544–3550 (2020)itle Suppressed Due to Excessive Length 1113. Choi, E., Bahadori, M.T., Song, L., Stewart, W.F., Sun, J.: Gram: graph-basedattention model for healthcare representation learning. In: SIGKDD (2017)14. Dai, W., Liu, X., Gao, Y., Chen, L., Song, J., Chen, D., Gao, K., Jiang, Y., Yang,Y., Chen, J., et al.: Matrix factorization-based prediction of novel drug indicationsby integrating genomic space. CMMM (2015)15. Dong, X., Gabrilovich, E., Heitz, G., Horn, W., Lao, N., Murphy, K., Strohmann,T., Sun, S., Zhang, W.: Knowledge vault: A web-scale approach to probabilisticknowledge fusion. In: SIGKDD. pp. 601–610 (2014)16. Dong, Y., Chawla, N.V., Swami, A.: metapath2vec: Scalable representation learn-ing for heterogeneous networks. In: SIGKDD. pp. 135–144 (2017)17. Fu, X., Zhang, J., Meng, Z., King, I.: Magnn: Metapath aggregated graph neuralnetwork for heterogeneous graph embedding. In: WWW. pp. 2331–2341 (2020)18. Gao, M., Chen, L., He, X., Zhou, A.: Bine: Bipartite network embedding. In: SIGIR.pp. 715–724 (2018)19. Goyal, P., Ferrara, E.: Graph embedding techniques, applications, and perfor-mance: A survey. Knowledge-Based Systems , 78–94 (2018)20. Grover, A., Leskovec, J.: node2vec: Scalable feature learning for networks. In:SIGKDD. pp. 855–864 (2016)21. Hu, B., Fang, Y., Shi, C.: Adversarial learning on heterogeneous information net-works. In: SIGKDD. pp. 120–129 (2019)22. Huang, E.W., Wang, S., Zhai, C.: Visage: Integrating external knowledge into elec-tronic medical record visualization. In: PSB. pp. 578–589. World Scientific (2018)23. Huang, X., Li, J., Hu, X.: Label informed attributed network embedding. In:WSDM. pp. 731–739 (2017)24. Ivanov, S., Burnaev, E.: Anonymous walk embeddings. arXiv:1805.11921 (2018)25. Junuthula, R.R., Xu, K.S., Devabhaktuni, V.K.: Evaluating link prediction accu-racy in dynamic networks with added and removed edges. In: BDCloud-SocialCom-SustainCom. pp. 377–384. IEEE (2016)26. Kazemi, S.M., Goel, R., Jain, K., Kobyzev, I., Sethi, A., Forsyth, P., Poupart,P.: Representation learning for dynamic graphs: A survey. Journal of MachineLearning Research (70), 1–73 (2020)27. Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutionalnetworks. arXiv:1609.02907 (2016)28. Li, G., Luo, J., Xiao, Q., Liang, C., Ding, P., Cao, B.: Predicting microrna-diseaseassociations using network topological similarity based on deepwalk. IEEE Access , 24032–24039 (2017)29. Li, X., Chen, W., Chen, Y., Zhang, X., Gu, J., Zhang, M.Q.: Network embedding-based representation learning for single cell rna-seq data. Nucleic acids research (19), e166–e166 (2017)30. Liu, H., Wu, Y., Yang, Y.: Analogical inference for multi-relational embeddings.In: ICML. pp. 2168–2178 (2017)31. Luo, Y., Zhao, X., Zhou, J., Yang, J., Zhang, Y., Kuang, W., Peng, J., Chen, L.,Zeng, J.: A network integration approach for drug-target interaction prediction andcomputational drug repositioning from heterogeneous information. Nature commu-nications (1), 1–13 (2017)32. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word repre-sentations in vector space. In: ICLR (Workshop Poster) (2013)33. Nickel, M., Rosasco, L., Poggio, T.: Holographic embeddings of knowledge graphs.In: AAAI (2016)34. Nickel, M., Tresp, V., Kriegel, H.P.: A three-way model for collective learning onmulti-relational data. In: ICML. vol. 11, pp. 809–816 (2011)2 Yankai Chen, Yaozu Wu, Shicheng Ma, and Irwin King35. Pan, S., Zhu, X., Zhang, C., Philip, S.Y.: Graph stream classification using labeledand unlabeled graphs. In: ICDE. pp. 398–409. IEEE (2013)36. Perozzi, B., Al-Rfou, R., Skiena, S.: Deepwalk: Online learning of social represen-tations. In: SIGKDD. pp. 701–710 (2014)37. Qiu, J., Dong, Y., Ma, H., Li, J., Wang, K., Tang, J.: Network embedding as matrixfactorization: Unifying deepwalk, line, pte, and node2vec. In: WSDM (2018)38. Shen, Z., Zhang, Y.H., Han, K., Nandi, A.K., Honig, B., Huang, D.S.: mirna-diseaseassociation prediction with collaborative matrix factorization. Complexity (2017)39. Shi, C., Hu, B., Zhao, W.X., Philip, S.Y.: Heterogeneous information networkembedding for recommendation. TKDE (2), 357–370 (2018)40. Shickel, B., Tighe, P.J., Bihorac, A., Rashidi, P.: Deep ehr: a survey of recentadvances in deep learning techniques for electronic health record (ehr) analysis.IEEE journal of biomedical and health informatics (5), 1589–1604 (2017)41. Skarding, J., Gabrys, B., Musial, K.: Foundations and modelling of dynamic net-works using dynamic graph neural networks: A survey. arXiv:2005.07496 (2020)42. Socher, R., Chen, D., Manning, C.D., Ng, A.: Reasoning with neural tensor net-works for knowledge base completion. In: NeurIPS. pp. 926–934 (2013)43. Su, C., Tong, J., Zhu, Y., Cui, P., Wang, F.: Network embedding in biomedicaldata science. Briefings in bioinformatics (1), 182–197 (2020)44. Sun, Z., Deng, Z., Nie, J., Tang, J.: Rotate: Knowledge graph embedding by rela-tional rotation in complex space. In: ICLR (Poster). OpenReview.net (2019)45. Trouillon, T., Welbl, J., Riedel, S., Gaussier, ´E., Bouchard, G.: Complex embed-dings for simple link prediction. ICML (2016)46. Veliˇckovi´c, P., Cucurull, G., Casanova, A., Romero, A., Lio, P., Bengio, Y.: Graphattention networks. arXiv:1710.10903 (2017)47. Wang, D., Cui, P., Zhu, W.: Structural deep network embedding. In: SIGKDD. pp.1225–1234 (2016)48. Wang, M., Liu, M., Liu, J., Wang, S., Long, G., Qian, B.: Safe medicine recom-mendation via medical knowledge graph embedding. arXiv:1710.05980 (2017)49. Wang, P., Hao, T., Yan, J., Jin, L.: Large-scale extraction of drug–disease pairsfrom the medical literature. Journal of the AIST (11), 2649–2661 (2017)50. Wang, S., Qu, M., Peng, J.: Prosnet: Integrating homology with molecular networksfor protein function prediction. In: PSB. pp. 27–38. World Scientific (2017)51. Wang, X., Ji, H., Shi, C., Wang, B., Ye, Y., Cui, P., Yu, P.S.: Heterogeneous graphattention network. In: WWW. pp. 2022–2032 (2019)52. Xinyi, Z., Chen, L.: Capsule graph neural network. In: ICLR (Poster). OpenRe-view.net (2019)53. Xu, L., Wei, X., Cao, J., Yu, P.S.: Embedding of embedding (eoe) joint embeddingfor coupled heterogeneous networks. In: WSDM. pp. 741–749 (2017)54. Yang, B., Yih, W.t., He, X., Gao, J., Deng, L.: Embedding entities and relationsfor learning and inference in knowledge bases. arXiv:1412.6575 (2014)55. Yang, H., Pan, S., Zhang, P., Chen, L., Lian, D., Zhang, C.: Binarized attributednetwork embedding. In: ICDM. pp. 1476–1481. IEEE (2018)56. Zhang, S., Tay, Y., Yao, L., Liu, Q.: Quaternion knowledge graph embeddings. In:NeurIPS. pp. 2731–2741 (2019)57. Zhao, C., Jiang, J., Guan, Y., Guo, X., He, B.: EMR-based medical knowledge rep-resentation and inference via markov random fields and distributed representationlearning. Artificial intelligence in medicine87