LightCAKE: A Lightweight Framework for Context-Aware Knowledge Graph Embedding
LLightCAKE: A Lightweight Framework forContext-Aware Knowledge Graph Embedding
Zhiyuan Ning , , Ziyue Qiao , , Hao Dong , , Yi Du (cid:12) ) , and Yuanchun Zhou Computer Network Information Center,Chinese Academy of Sciences, Beijing, China University of Chinese Academy of Sciences, Beijing, China { ningzhiyuan,qiaoziyue,donghao,duyi,zyc } @cnic.cn Abstract.
Knowledge graph embedding (KGE) models learn to projectsymbolic entities and relations into a continuous vector space based onthe observed triplets. However, existing KGE models cannot make aproper trade-off between the graph context and the model complexity,which makes them still far from satisfactory. In this paper, we proposea lightweight framework named LightCAKE for context-aware KGE.LightCAKE explicitly models the graph context without introducing re-dundant trainable parameters, and uses an iterative aggregation strategyto integrate the context information into the entity/relation embeddings.As a generic framework, it can be used with many simple KGE models toachieve excellent results. Finally, extensive experiments on public bench-marks demonstrate the efficiency and effectiveness of our framework.
Keywords:
Knowledge graph embedding · Lightweight · Graph context
Recently, large-scale knowledge graphs (KGs) have been widely applied to nu-merous AI-related applications. Indeed, KGs are usually expressed as multi-relational directed graphs composed of entities as nodes and relations as edges.The real-world facts stored in KGs are modeled as triplets (head entity, relation,tail entity), which are denoted as ( h, r, t ).Nevertheless, KGs are usually incomplete due to the constant emergence ofnew knowledge. To address this issue, a series of knowledge graph embedding(KGE) models have been proposed [14]. KGE models project symbolic entitiesand relations into a continuous vector space, and use scoring functions to measurethe plausibility of triplets. By optimizing the scoring functions to assign higherscores to true triplets than invalid ones, KGE models learn low-dimensionalrepresentations (called embeddings) for all entities and relations, and these em-beddings are then used to predict new facts. Most of the previous KGE modelsuse translation distance based [2,15] and semantic matching based [11,17] scor-ing functions which perform additive and multiplicative operations, respectively.These models have been shown to be scalable and effective. a r X i v : . [ c s . A I] M a r Z. Ning et al.
However, the aforementioned KGE models only focus on modeling individualtriplets and ignore the graph context , which contains plenty of valuable struc-tural information. We argue that there are two types of important graph contextsrequired for successfully predicting the relation between two entities: (1) The en-tity context , i.e., for an entity, its neighboring nodes and the correspondingedges connecting the entity to its neighboring nodes. The entity context depictsthe subtle differences between two entities. As an example shown in Fig. 1(a),we aim to predict whether
Joe Biden or Hillary Clinton is the president of the
USA . Both of them have the same relation ” birthplace of ” with the
USA , butthey have distinct entity contexts.
Joe Biden’s neighboring node,
Donald Trump ,is the president of the
USA , and
Biden is his successor. Whereas there is no suchrelationship between
Hillary Clinton and her neighboring nodes. Capturing suchentity context will help predict the correct triplet (
Joe Biden , president of , USA ).(2) The relation context , i.e., the two endpoints of a given relation. Relationcontext implicitly indicates the category of related entities. Taking Fig. 1(b)as an example, both the
USA and
New York were
Donald Trump’s birthplace,but according to the context of ” president of ”, the related tail entities { China , Russia , . . . } tend to be a set of countries. Since New York is a city and it ispart of the
USA which is a country, (
Donald Trump , president of , USA ) is theright triplet. Moreover, entities and relations rarely appear in isolation, so con-sidering entity context and relation context together will provide more beneficialinformation.
USA BarackObama Hillary ClintonDonaldTrumpJoeBiden succeeded_by birthplace_of birthplace_of president_of? √ president_of? × (a) Entity Context … p r e s i d e n t _ o f Jinping
XiChina p r e s i d e n t _ o f Vladimir
PutinRussia part_of
New
York
USA DonaldTrump (b) Relation Context
Fig. 1. Examples of graph context which can help the relation predictionin knowledge graph.
Nodes represent entities, solid lines represent actual relations,dashed lines represent the relations to be predicted. Red dashed boxes frame the criticalentity context (Figure a) and relation context (Figure b) that can provide importantinformation for correctly predicting the relation between two entities.
In order to model the graph context, some recent work has attempted toapply graph neural network (GNN) to KGE [1,8]. These GNN-based KGE mod-els are effective to aggregate information from multi-hop neighbors to enrichthe entity/relation representation. However, GNN introduces more model pa-rameters and tensor computations, therefore making it difficult to utilize thesemodels for large-scale real-world KGs. In addition, most GNN-based KGE mod- ightCAKE 3 els only exploit entity context or relation context individually, which may leadto information loss.In this paper, we propose a
Light weight Framework for C ontext- A ware K nowledge Graph E mbedding ( LightCAKE ) to address the shortcomings ofexisting models. LightCAKE first builds the context star graph to model the en-tity/relation context. It then uses non-parameterized operations like subtraction(inspired by TransE[2]) or multiplication (inspired by DistMult[17]) to encodecontext nodes in the context star graph. Lastly, every entity/relation node in thecontext star graph aggregates information from its surrounding context nodesbased on the weights calculated by a scoring function. LightCAKE considers bothentity context and relation context, and introduces no new parameters, makingit very lightweight and capable of being used on large-scale KGs. The contribu-tions of our work can be summarized as follows: (1) We propose a lightweightframework (LightCAKE) for KGE that explicitly model the entity context andrelation context without the sacrifice in the model complexity; (2) As a generalframework, we can apply many simple methods like TransE[2] and DistMult[17]to LightCAKE; (3) Through extensive experiments on relation prediction task,we demonstrate the effectiveness and efficiency of LightCAKE.
Most early KGE models only exploit the triplets and can be roughly categorizedinto two classes[14]: translation distance based and semantic matching based.Translation distance based models are also known as additive models , sincethey project head and tail entities into the same embedding space, and treat therelations as the translations from head entities to tail entities. The objective isthat the translated head entity should be close to the tail entity. TransE [2] isthe first and most representative of such models. A series of work is conductedalong this line such as TransR [7] and TransH [15]. On the other hand, semanticmatching based models such as DistMult [17] and ComplEx [11] use multiplica-tive score functions for computing the plausibility of the given triplets, so theyare also called multiplicative models . Both models are conceptually simpleand it is easy to apply them to large-scale KGs. But they ignore the structuredinformation stored in the graph context of KGs.In contrast,
GNN-based models attempt to use GNN for graph contextmodeling. These models first aggregate graph context into entity/relation em-beddings through GNN, then pass the context-aware embeddings to the context-independent scoring functions for scoring. R-GCN [8] is an extension of the graphconvolutional network [6] on relational data. It applies a convolution operationto the neighboring nodes of each entity and assigns them equal weights. A2N [1]uses a method similar to graph attention networks [12] to further distinguishthe weights of neighboring nodes. However, this type of KGE models suffer fromoverparameterization since there are many parameters in GNN, which will hin-der the application of such models to large-scale KGs. In addition, they don’tintegrate entity context and relation context, which may cause information loss.
Z. Ning et al. (r , c)(r , e) a (r , d)(r , a) (r , f) b r (b , f) (a , c)(a , d) (b , a)(a , e) r b c d ea r r r r f r Entity ContextEntity Relation ContextRelation
Entity Context Star Graph Relation Context Star Graph
Knowledge Graph 𝝓 𝒆𝒏𝒕 𝝓 𝒓𝒆𝒍 BUILD BUILD
Fig. 2. Overview of LightCAKE. (1) For a KG (Middle), we build an entity con-text star graph (Left) for all entities and a relation context star graph (Right) for allrelations. In entity/relation context star graph, each entity/relation is surrounded byits entity/relation context and they are connected to each other by solid black lines.(2) The yellow rhombus φ ent and φ rel denote context encoders (Details in Sect. 4.1),and the gray dashed line indicates the input and output of the encoders. (3) The bluedashed line denotes the weight α (Eq. (3)), and the green dashed line denotes theweight β (Eq. (4)). The thicker the line, the greater the weight. A KG can be considered as a collection of triplets G = { ( h, r, t ) | ( h, r, t ) ∈E × R × E} , where E is the entity set and R is the relation set. h, t ∈ E representthe head entity and tail entity, r ∈ R denotes the relation linking from the headentity h to tail entity t . Given a triplet ( h, r, t ), the corresponding embeddingsare e h , e r , e t , where e h , e r , e t ∈ R d , and d is the embedding dimension. KGEmodels usually define a scoring function ψ : R d × R d × R d → R . It takes thecorresponding embedding ( e h , e r , e t ) of a triplet ( h, r, t ) as input, and producesa score reflecting the plausibility of the triplet.In this paper, the objective is to predict the missing links in G , i.e., givenan entity pair ( h, t ), we aim to predict the missing relation r between them. Werefer to this task as relation prediction . Some related work formulates thisproblem as link prediction, i.e., predicting the missing tail/head entity given ahead/tail entity and a relation. The two problems have proven to be actuallyreducible to each other [13]. Entity Context : For an entity h in G , the entity context of h is defined as C ent ( h ) = { ( r, t ) | ( h, r, t ) ∈ G} , i.e., all the (relation, tail) pairs in G whose head is h . ightCAKE 5 Definition 2.
Relation Context : For a relation r in G , the relation contextof r is defined as C rel ( r ) = { ( h, t ) | ( h, r, t ) ∈ G} , i.e., all the (head, tail) pairsin G whose relation is r . Note that the entity context C ent ( h ) only considers the neighbors of h for itsoutgoing edges and ignores the neighbors for its incoming edges. This is becausefor each triplet ( h, r, t ) ∈ G , we create a corresponding inverse triplet ( t, r − , h )and add it to G . In this way, for entity t , { ( r, h ) | ( h, r, t ) ∈ G} can be convertedto a format of { ( r − , h ) | ( t, r − , h ) ∈ G} , and it is equivalent to C ent ( t ). Thus, C ent ( · ) can contain both the outgoing and incoming neighbors for each entity.To explicitly model entity context and relation context for a KG G (As shownin Fig. 2 middle), we construct an entity context star graph (As shown inFig. 2 left) and a relation context star graph (As shown in Fig. 2 right),respectively. In the entity context star graph, all the central nodes are the entitiesin G , and each entity h is surrounded by its entity context C ent ( h ). Similarly, inthe relation context star graph, all the central nodes are the relations in G , andeach relation r is surrounded by its relation context C rel ( r ). Given the context star graph, LightCAKE can (1) encode each entity/relationcontext node into an embedding; (2) learn the context-aware embedding for eachentity/relation by iteratively aggregating information from its context nodes.
Denote e (0) h and e (0) r as the randomly initialized embedding of an entity h and arelation r respectively. The aggregation functions are formulated as: e ( l +1) h = e ( l ) h + (cid:88) ( r (cid:48) ,t (cid:48) ) ∈C ent ( h ) α ( l ) h, ( r (cid:48) ,t (cid:48) ) φ ent ( e r (cid:48) , e t (cid:48) ) (1) e ( l +1) r = e ( l ) r + (cid:88) ( h (cid:48) ,t (cid:48) ) ∈C rel ( r ) β ( l ) r, ( h (cid:48) ,t (cid:48) ) φ rel ( e h (cid:48) , e t (cid:48) ) (2)Here, e ( l +1) h and e ( l +1) r are the embeddings of h and r after l -iterations aggrega-tions. 0 ≤ l ≤ L and L is the total number of iterations. φ ent ( · ) : R d × R d → R d is the entity context encoder, and φ rel ( · ) : R d × R d → R d is the relation con-text encoder. α ( l ) h, ( r (cid:48) ,t (cid:48) ) and β ( l ) r, ( h (cid:48) ,t (cid:48) ) are the weights in iteration l , representinghow important each context node is for h and r , respectively. We introduce thescoring function ψ ( · ) to calculate them: α ( l ) h, ( r (cid:48) ,t (cid:48) ) = exp( ψ ( e ( l ) h , e ( l ) r (cid:48) , e ( l ) t (cid:48) ) (cid:80) ( r (cid:48)(cid:48) ,t (cid:48)(cid:48) ) ∈C ent ( h ) exp( ψ ( e ( l ) h , e ( l ) r (cid:48)(cid:48) , e ( l ) t (cid:48)(cid:48) )) (3) Z. Ning et al. β ( l ) r, ( h (cid:48) ,t (cid:48) ) = exp( ψ ( e ( l ) h (cid:48) , e ( l ) r , e ( l ) t (cid:48) ) (cid:80) ( h (cid:48)(cid:48) ,t (cid:48)(cid:48) ) ∈C rel ( r ) exp( ψ ( e ( l ) h (cid:48)(cid:48) , e ( l ) r , e ( l ) t (cid:48)(cid:48) )) (4)When Eq. (1) and Eq. (2) are iteratively executed L times, for any h, t ∈ E and r ∈ R , we obtain the final context-enhanced embeddings e ( L ) h , e ( L ) r , e ( L ) t . Toperform relation prediction, we compute the probability of the relation r giventhe head entity h and tail entity t using a softmax function: p ( r | h, t ) = exp( ψ ( e ( L ) h , e ( L ) r , e ( L ) t ) (cid:80) r (cid:48) ∈R exp( ψ ( e ( L ) h , e ( L ) r (cid:48) , e ( L ) t )) (5)where R is the set of relations, ψ ( · ) is the same scoring function used in Eq. (3)and Eq. (4). Then, we train the model by minimizing the following loss function: L = − |D| |D| (cid:88) i =0 log p ( r i | h i , t i ) (6)where D is the training set, and ( h i , r i , t i ) ∈ D is one of the training triplets. LightCAKE is a generic framework, and we can substitute different scoring func-tion ψ ( · ) of different KGE models into Eq. (3), Eq. (4), and Eq. (5). And wecan design different φ ent ( · ) and φ rel ( · ) to encode context. In order to make theframework lightweight, we apply TransE [2] and DistMult [17], which are the sim-plest and most representative of the additive models and multiplicative modelsrespectively, to LightCAKE. LightCAKE-TransE
The scoring function of TransE [2] is: ψ T ransE ( e h , e r , e t ) = − (cid:107) e h + e r − e t (cid:107) = − (cid:107) e t − e r − e h (cid:107) (7)where (cid:107)·(cid:107) is the L2-norm. Eq. (7) can be decomposed of the two following steps: e ( h,r,t ) = V T ransE ( e h , e r , e t ) = e t − e r − e h (8) score = S T ransE ( e ( h,r,t ) ) = − (cid:13)(cid:13) e ( h,r,t ) (cid:13)(cid:13) (9)where V · : R d × R d × R d → R d and S · : R d → R . The e ( h,r,t ) denotes theembedding of a triplet ( h, r, t ), score denotes the score of the triplet. In Eq. (8),TransE uses addition and subtraction to encode triplets. Moreover, the operationbetween e r and e t is subtraction, and the operation between e h and e t is alsosubtraction. So we design φ ent ( e r (cid:48) , e t (cid:48) ) = e t (cid:48) − e r (cid:48) and φ rel ( e h (cid:48) , e t (cid:48) ) = e t (cid:48) − e h (cid:48) to encode context, then the aggregation function of LightCAKE-TransE can beformalized as: e ( l +1) h = e ( l ) h + (cid:88) ( r (cid:48) ,t (cid:48) ) ∈C ent ( h ) α ( l ) h, ( r (cid:48) ,t (cid:48) ) ( e t (cid:48) − e r (cid:48) ) (10) ightCAKE 7 e ( l +1) r = e ( l ) r + (cid:88) ( h (cid:48) ,t (cid:48) ) ∈C rel ( r ) β ( l ) r, ( h (cid:48) ,t (cid:48) ) ( e t (cid:48) − e h (cid:48) ) (11)Lastly, substitute ψ T ransE ( e h , e r , e t ) from Eq. (7) into Eq. (3), Eq. (4) andEq. (5), we will get the complete LightCAKE-TransE. LightCAKE-DistMult
The scoring function of DistMult [17] is: ψ DistMult ( e h , e r , e t ) = (cid:104) e h , e r , e t (cid:105) (12)where (cid:104)·(cid:105) denotes the generalized dot product. Eq. (12) can be decomposed ofthe two following steps: e ( h,r,t ) = V DistMult ( e h , e r , e t ) = e h (cid:12) e r (cid:12) e t (13) score = S DistMult ( e ( h,r,t ) ) = (cid:88) i e ( h,r,t ) [ i ] (14)where (cid:12) denotes the element-wise product, and e ( h,r,t ) [ i ] denotes the i-th elementin embedding e ( h,r,t ) . In Eq. (13), DistMult uses multiplication to encode triplets.Moreover, the operation between e r and e t is multiplication, and the operationbetween e h and e t is also multiplication. So we design φ ent ( e r (cid:48) , e t (cid:48) ) = e t (cid:48) (cid:12) e r (cid:48) and φ rel ( e h (cid:48) , e t (cid:48) ) = e t (cid:48) (cid:12) e h (cid:48) to encode context, then the aggregation function ofLightCAKE-DistMult can be formalized as: e ( l +1) h = e ( l ) h + (cid:88) ( r (cid:48) ,t (cid:48) ) ∈C ent ( h ) α ( l ) h, ( r (cid:48) ,t (cid:48) ) ( e t (cid:48) (cid:12) e r (cid:48) ) (15) e ( l +1) r = e ( l ) r + (cid:88) ( h (cid:48) ,t (cid:48) ) ∈C rel ( r ) β ( l ) r, ( h (cid:48) ,t (cid:48) ) ( e t (cid:48) (cid:12) e h (cid:48) ) (16)Lastly, substitute ψ DistMult ( e h , e r , e t ) from Eq. (12) into Eq. (3), Eq. (4) andEq. (5), we will get the complete LightCAKE-DistMult.Notably, there are no extra trainable parameters introduced in LightCAKE-TransE and LightCAKE-DistMult, making them lightweight and efficient. We evaluate LightCAKE on four popular benchmark datasets WN18RR [3],FB15K-237 [10], NELL995 [16] and DDB14 [13]. WN18RR is extracted fromWordNet, containing conceptual-semantic and lexical relations among Englishwords. FB15K-237 is extracted from Freebase, a large-scale KG with general hu-man knowledge. NELL995 is extracted from the 995th iteration of the NELLsystem containing general knowledge. DDB14 is extracted from the DiseaseDatabase, a medical database containing terminologies and concepts as well astheir relationships. The statistics of the datasets are summarized in Table 1.
Z. Ning et al.
Table 1. Statistics of four datasets. avg. |C ent ( h ) | and avg. |C rel ( r ) | represent theaverage number of entity context and relation context, respectively. Dataset FB15K-237 WN18RR NELL995 DDB14 |C ent ( h ) | |C rel ( r ) | To prove the effectiveness of LightCAKE, we compare LightCAKE-TransE andLightCAKE-DistMult with six baselines, including (1) original TransE and Dist-Mult without aggregating entity context and relation context; (2) three state-of-the-art KGE models: ComplEx, SimplE, RotatE; (3) a classic GNN-based KGEmodel: R-GCN. Brief descriptions of baselines are as follows:
TransE [2]:
TransE is one of the most widely-used KGE models which trans-lates the head embedding into tail embedding by adding it to relation embedding.
DistMult [17]:
DistMult is a popular tensor factorization based modelwhich uses a bilinear score function to compute scores of knowledge triplets.
ComplEx [11]:
ComplEx is an extension of DistMult which embeds entitiesand relations into complex vectors instead of real-valued ones.
SimplE [4]:
SimplE is a simple interpretable fully-expressive tensor factor-ization model for knowledge graph completion.
RotatE [9]:
RotatE defines each relation as a rotation from the head entityto the tail entity in the complex vector space.
R-GCN [8]:
RGCN is a variation of graph neural network, it can dealwith the highly multi-relational knowledge graph data and aggregate contextinformation to entities.To simplify, we use L -TransE to represent LightCAKE-TransE and use L -DistMult to represent LightCAKE-DistMult. We use Adam [5] as the optimizer with the learning rate as 5e-3. We set theembedding dimension of entity and relation as 256, l penalty coefficient as 1e-7, batch size as 512, the total number of iterations L as 4 and a maximum of20 epochs. Moreover, we use early stopping for training, and all the trainingparameters are randomly initialized.We evaluate all methods in the setting of relation prediction, i.e., for a givenentity pair ( h, t ) in the test set, we rank the ground-truth relation type r againstall other candidate relation types. We compare our models with baselines using ightCAKE 9 the following metrics: (1) Mean Reciprocal Rank (MRR, the mean of all the re-ciprocals of predicted ranks); (2) Mean Rank (MR, the mean of all the predictedranks); (3) Hit@3(the proportion of correctly predicted entities ranked in thetop 3 predictions). Table 2. Results of relation prediction. (Bold: best; Underline: runner-up.)
The results of ComplEx, SimplE and RotatE are taken from [13]. Noted that thetrainable parameters in L -TransE and L -DistMult are only entity embeddings andrelation embeddings, for a fair comparison, we only choose those 3 traditional baselinesfrom [13] with a small number of parameters. In addition, in order to compare context-aware KGE and context-independent KGE in the same experimental environment toprove the validity of LightCAKE, we implemented TransE and DistMult ourselves. Method
WN18RR FB15K-237 NELL995 DDB14MRR MR ↓ Hit@3 MRR MR ↓ Hit@3 MRR MR ↓ Hit@3 MRR MR ↓ Hit@3ComplEx 0.840 2.053 0.880 0.924 1.494 0.970 0.703 23.040 0.765 0.953 1.287 0.968SimplE 0.730 3.259 0.755 L -TransE 0.813 1.648 0.933 0.943 2.281 0.962 0.793 9.325 0.831 0.964 1.184 0.969DistMult 0.865 1.743 0.922 0.935 1.920 0.979 0.712 22.340 0.744 0.937 1.334 0.958 L -DistMult The results on all datasets are reported in Table 2. We can observe that: (1)Comparing with the original TransE and DistMult, our proposed L -TransE and L -DistMult consistently have superior performance on all datasets, proving thatLightCAKE can greatly improve the performance of context-independent KGEmodels; (2) Comparing with all six KGE baselines, the proposed L -TransE and L -DistMult achieve substantial improvements or state-of-the-art performance onall datasets, showing the effectiveness of L -TransE and L -DistMult. LightCAKE utilizes both entity context and relation context. How does eachcontext affect the performance of LightCAKE? To answer this question, we pro-pose model variants to conduct ablation studies on L -TransE and L -DistMultincluding: (1) the original TransE and DistMult without considering entity con-text and relation context; (2) L rel -TransE and L rel -DistMult that just aggregatethe relation context and discard the entity context; (3) L ent -TransE and L ent -DistMult that just aggregate the entity context and discard the relation context. The experimental results of MRR on datasets WN18RR and FB15K237 arereported in Fig. 3 (a),(b),(d), and (e). L -TransE and L -DistMult achieve bestperformance compared with their corresponding model variants, demonstratingthat integrating both entity context and relation context is most effective forKGE. Also, L rel -TransE and L ent -TransE are both better than TransE, L rel -DistMult and L ent -DistMult are both better than DistMult, indicating thatentity context and relation context are both helpful for KGE. L ent -TransE isbetter than L rel -TransE and L ent -DistMult is better than L rel -DistMult, show-ing that entity context contributes more to improving the model performancethan relation context. T r a n s E r e l - T r a n s E e n t - T r a n s E - T r a n s E M RR (a) WN18RR T r a n s E r e l - T r a n s E e n t - T r a n s E - T r a n s E M RR (b) FB15K237 1 X P E H U R I L W H U D W L R Q V L 0 5 5 (c) L -TransE D i s t M u l t r e l - D i s t M u l t e n t - D i s t M u l t - D i s t M u l t M RR (d) WN18RR D i s t M u l t r e l - D i s t M u l t e n t - D i s t M u l t - D i s t M u l t M RR (e) FB15K237 1 X P E H U R I L W H U D W L R Q V L 0 5 5 (f) L -DistMult Fig. 3.
The performance of model variants for (a) L -TransE and (d) L -DistMult onWN18RR dataset. The performance of model variants for (b) L -TransE and (e) L -DistMult on FB15K237 dataset. The performance of various L for (c) L -TransE and(f) L -DistMult on WN18RR dataset. In this section, we investigate the sensitivity of the parameter L , i.e., the numberof iterations. We report the MRR on WN18RR dataset. We set that L rangesfrom 1 to 5. The results of L -TransE and L -DistMult are shown in Fig. 3 (c)and (f), we can observe that with the growth of the number of iterations, theperformance raises first and then starts to decrease slightly, which may due towhen further contexts are involved, more uncorrelated information are integrated ightCAKE 11 into embeddings. So properly setting the number of L can help to improve theperformance of our method. We evaluate the efficiency of LightCAKE by comparing it with DistMult andR-GCN. We investigate the difference of DistMult, R-GCN and L -DistMult inthe views of entity context, relation context, parameter quantities (space com-plexity), and the MRR in WN18RR dataset. The results are shown in Table 3.We can observe that the parameter quantities of L -DistMult are far less than R-GCN, that is because R-GCN use complicated matrix transformation to encodecontext information, while L -DistMult only uses multiplication on embeddingsto encode context information. Also, both DistMult and L -DistMult achieve bet-ter prediction results than R-GCN in the relation prediction task, which maybecause R-GCN is overfitted due to the use of too many parameters. In summary, L -DistMult is lighter, more efficient and more robust. Table 3. Efficiency Analysis . Here, d is the embedding dimension, L is the numberof iterations, |E| and |R| indicate the total number of entities and relations respectively.Models EntityContext RelationContext SpaceComplexity MRRDistMult[17] (cid:55) (cid:55) O ( |E| d + |R| d ) 0.865R-GCN[8] (cid:51) (cid:55) O ( L ( d + |E| d + |R| d )) 0.823 L -DistMult (cid:51) (cid:51) O ( L ( |E| d + |R| d )) 0.955 In this paper, we propose LightCAKE to learn context-aware knowledge graphembedding. LightCAKE considers both the entity context and relation context,and extensive experiments show its superior performance comparing with state-of-the-art KGE models. In addition, LightCAKE is very lightweight and efficientin aggregating context information. Future research will explore more possiblecontext encoder, i.e. φ ent and φ rel , and more possible scoring functions used inEq. (3), Eq. (4) and Eq. (5) to make LightCAKE more general and powerful. Acknowledgments.
This research was supported by the Natural Science Foun-dation of China under Grant No. 61836013, the Ministry of Science and Tech-nology Innovation Methods Special work Project under grant 2019IM020100,the Beijing Natural Science Foundation(4212030), and Beijing Nova Program ofScience and Technology under Grant No. Z191100001119090. Zhiyuan Ning andZiyue Qiao contribute equally to this work. Yi Du is the corresponding author.
References
1. Bansal, T., Juan, D.C., Ravi, S., McCallum, A.: A2n: Attending to neighbors forknowledge graph inference. In: Proceedings of the 57th Annual Meeting of theAssociation for Computational Linguistics. pp. 4387–4392 (2019)2. Bordes, A., Usunier, N., Garcia-Duran, A., Weston, J., Yakhnenko, O.: Translatingembeddings for modeling multi-relational data. In: Advances in Neural InformationProcessing Systems 26. vol. 26, pp. 2787–2795 (2013)3. Dettmers, T., Pasquale, M., Pontus, S., Riedel, S.: Convolutional 2d knowledgegraph embeddings. In: Proceedings of the 32th AAAI Conference on ArtificialIntelligence. pp. 1811–1818 (February 2018)4. Kazemi, S.M., Poole, D.: Simple embedding for link prediction in knowledgegraphs. In: Advances in neural information processing systems. pp. 4284–4295(2018)5. Kingma, D.P., Ba, J.L.: Adam: A method for stochastic optimization. In: ICLR2015 : International Conference on Learning Representations 2015 (2015)6. Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutionalnetworks. In: International Conference on Learning Representations (ICLR) (2017)7. Lin, Y., Liu, Z., Sun, M., Liu, Y., Zhu, X.: Learning entity and relation embeddingsfor knowledge graph completion. In: AAAI’15 Proceedings of the Twenty-NinthAAAI Conference on Artificial Intelligence. pp. 2181–2187 (2015)8. Schlichtkrull, M., Kipf, T.N., Bloem, P., Van Den Berg, R., Titov, I., Welling,M.: Modeling relational data with graph convolutional networks. In: EuropeanSemantic Web Conference. pp. 593–607. Springer (2018)9. Sun, Z., Deng, Z.H., Nie, J.Y., Tang, J.: Rotate: Knowledge graph embeddingby relational rotation in complex space. In: International Conference on LearningRepresentations (2019)10. Toutanova, K., Chen, D.: Observed versus latent features for knowledge base andtext inference. In: Proceedings of the 3rd Workshop on Continuous Vector SpaceModels and their Compositionality. pp. 57–66 (2015)11. Trouillon, T., Welbl, J., Riedel, S., Gaussier, ´E., Bouchard, G.: Complex embed-dings for simple link prediction. In: International Conference on Machine Learning.pp. 2071–2080. PMLR (2016)12. Veliˇckovi´c, P., Cucurull, G., Casanova, A., Romero, A., Li`o, P., Bengio, Y.: GraphAttention Networks. International Conference on Learning Representations (2018)13. Wang, H., Ren, H., Leskovec, J.: Entity context and relational paths for knowledgegraph completion. arXiv preprint arXiv:2002.06757 (2020)14. Wang, Q., Mao, Z., Wang, B., Guo, L.: Knowledge graph embedding: A surveyof approaches and applications. IEEE Transactions on Knowledge and Data Engi-neering29