Generalized Relation Learning with Semantic Correlation Awareness for Link Prediction
Yao Zhang, Xu Zhang, Jun Wang, Hongru Liang, Wenqiang Lei, Zhe Sun, Adam Jatowt, Zhenglu Yang
GGeneralized Relation Learning with Semantic Correlation Awarenessfor Link Prediction
Yao Zhang , Xu Zhang , Jun Wang , Hongru Liang ,Wenqiang Lei , Zhe Sun , Adam Jatowt , Zhenglu Yang TKLNDST, CS, Nankai University, China Ludong University, China National University of Singapore, Singapore Computational Engineering Applications Unit, RIKEN, Japan Kyoto University, Japan { yaozhang, xuzhang, lianghr, junwang } @mail.nankai.edu.cn, [email protected],[email protected], [email protected], [email protected] Abstract
Developing link prediction models to automatically completeknowledge graphs has recently been the focus of significantresearch interest. The current methods for the link predic-tion task have two natural problems: 1) the relation distri-butions in knowledge graphs are usually unbalanced, and 2)there are many unseen relations that occur in practical situa-tions. These two problems limit the training effectiveness andpractical applications of the existing link prediction models.We advocate a holistic understanding of KGs and we proposein this work a unified Generalized Relation Learning frame-work GRL to address the above two problems, which can beplugged into existing link prediction models. GRL conductsa generalized relation learning, which is aware of semanticcorrelations between relations that serve as a bridge to con-nect semantically similar relations. After training with GRL,the closeness of semantically similar relations in vector spaceand the discrimination of dissimilar relations are improved.We perform comprehensive experiments on six benchmarksto demonstrate the superior capability of GRL in the link pre-diction task. In particular, GRL is found to enhance the exist-ing link prediction models making them insensitive to unbal-anced relation distributions and capable of learning unseenrelations.
Introduction
Knowledge graphs (KGs), representing facts in seman-tic graph structures, have been applied to multiple artifi-cial intelligence tasks, e.g., recommendation (Wang et al.2019a), dialogue generation (Moon et al. 2019), and ques-tion answering (Christmann et al. 2019). In KGs, factsare formed as triples, (head entity, relation,tail entity) , where the head entity is linked to thetail entity via the relation. New knowledge emerges con-tinuously, and hence the issue of incompleteness of KGshas triggered wide research interests in link prediction task,which requires predicting the missing links in KGs (Seyedand David 2018). The mainstream link prediction mod-els (Bordes et al. 2013; Dettmers et al. 2018) learn the em-beddings of entities and relations, and then use a score func-tion to estimate the validity of triples. However, we believe * Corresponding authors.
Figure 1: (a) The unbalanced relation distribution in theFB15K-237 dataset where relations are sorted according totheir frequency. (b) Lots of unseen relations. Three film -related relations are respectively categorized into the many-shot class, few-shot class, and zero-shot class as marked.using the embedding learning for mainstream link predictionmodels results in two key problems: • Unbalanced relation distribution.
As shown in Fig-ure 1, the relation distribution in an off-the-shelf KGlearning resource (i.e., FB15K-237 (Toutanova and Chen2015)) is quite unbalanced. For example, the frequen-cies of the two relations /film/film/language and /film/film/edited differ greatly. Mainstream linkprediction models assume enough training instances for allrelations and pay less attention to few-shot relations, disre-garding the fact that few-shot relation learning may influ-ence the learning performance to a high degree. • Existence of unseen relations.
Real-world KGs tend tobe open and evolve quickly, and accordingly there is alarge number of zero-shot relations unseen in the off-the-shelf learning resources, for example, the relation /film/film/sequel in Figure 1. The unseen relationsare beyond the capacity of mainstream link prediction mod-els, as there are no training instances to learn their embed-dings. This problem may restrict the use of these models indownstream tasks. a r X i v : . [ c s . A I] D ec ecently, some efforts have been conducted on addressingthe above problems. Xiong et al. (2018), Shi and Weninger(2018), and Chen et al. (2019a) adopted the meta-learningor metric-based approaches to train on limited training sam-ples and perform fast learning on new few-shot relations.These studies show promise in few-shot relation learning,however they have difficulty in tackling unbalanced rela-tion distributions, which is mainly attributed to the excessivetime cost required for training numerous relations. More re-cently, Chen et al. (2019b), Qin et al. (2020) predicted theunseen relations by extracting information from textual de-scriptions. They were able to successfully complete the un-seen relation prediction task. However, these models are notappropriate for link prediction task, since textual descrip-tions tend to be noisy and also cannot build a bridge betweenseen and unseen relations. In general, an ideal link predictionmodel should be able to jointly learn many-, few-, and zero-shot relations.Regarding the joint relation learning, we notice thatsemantic correlations, which denote the similarities ofrelations in semantics, can serve as a bridge to con-nect the learning of many-, few-, and zero-shot re-lations. Take Figure 1 as an instance. The many-shot relation “ /film/film/language ”, few-shot re-lation “ /film/film/edited ”, and zero-shot relation“ /film/film/sequel ” are all related to “ film ”. Basedon the assumption that semantically similar relations shouldbe located near each other in embedding space (Yang et al.2015), it makes sense to exploit semantic correlations, suchas the one in the above-mentioned example, to accomplishthe joint relation learning. Inspired by this, we propose a G eneralized R elation L earning framework (abbreviated toGRL) based on learning semantic correlations. GRL can beplugged into a mainstream link prediction model to make it(1) insensitive to unbalanced relation distributions and (2)capable of learning zero-shot relations.Specifically, GRL is plugged into a link prediction modelafter the embedding learning stage. To optimize the relationembedding, GRL extracts rich semantic correlations throughan attention mechanism, fuses different relations, and min-imizes the classification-aware loss to enable these implic-itly embedded semantic correlations in the relation embed-dings. Then, the closeness of semantically similar relationsin vector space and the discrimination of dissimilar relationscan be improved. In this way, few-shot relations can learnknowledge from the semantically similar many-shot rela-tions; for zero-shot relations, their most semantically sim-ilar relation can also be predicted. In our experiments, weimprove two base models (DistMult (Yang et al. 2015) andConvE (Dettmers et al. 2018)) by incorporating the proposedGRL framework on all relation classes, i.e., many, few, andzero-shot relations. Our work is an important step towards aholistic understanding of KGs and a generalized solution ofrelation learning for the link prediction task.Our contributions are as follows: • We carefully consider two key problems of the embeddinglearning used by mainstream link prediction models andwe highlight the necessity of jointly learning many-, few-, and zero- shot relations. • We propose GRL framework by leveraging the rich se-mantic correlations between relations to make the link pre-diction models insensitive to unbalanced relation distribu-tions and capable of learning zero-shot relations. • We perform experiments on six benchmarks to evaluatethe link prediction capability of GRL, and show that GRLlets the base link prediction models perform well acrossmany-, few-, and zero- shot relations.
Related Work
Since KGs are populated by automatic text processing theyare often incomplete, and it is usually infeasible to manuallyadd to them all the relevant facts. Hence, many researchesapproached the task of predicting missing links in KGs.
Mainstream link prediction models widely useembedding-based methods to map entities and relationsinto continuous low-dimensional vector space and use ascore function to predict whether the triples are valid. Theycan be broadly classified as translational based (Bordeset al. 2013; Wang et al. 2014; Lin et al. 2015; Ji et al.2016), multiplicative based (Nickel, Tresp, and Kriegel2011; Yang et al. 2015; Trouillon et al. 2016), and neuralnetwork based (Dettmers et al. 2018; Schlichtkrull et al.2018). These models are based on the implicit assumptionthat all relations are distributed within the dataset in abalanced way. Hence, they perform poorly in few-shotrelation learning scenarios because these models neglect theimbalanced distributions, as well as they cannot properlyhandle zero-shot relations due to keeping only the knowl-edge of existing relations and not learning information onunseen relations.
Few-shot relation learning models attempt to adopt themeta-learning (Chen et al. 2019a; Lv et al. 2019) and metric-based (Xiong et al. 2018; Wang et al. 2019b) methods tolearn knowledge from only a few samples. However, thefew-shot learning models can be quite computationally ex-pensive because they need to spend extra time retraining oneach few-shot relation (meta-learning), or need to comparethe few-shot relations one by one (metric-based). In prac-tice, the many-shot and few-shot scenarios are not explic-itly distinguished.
Zero-shot relation learning models aimto learn relations that are unseen in the training set. Re-searchers have proposed several models to deal with zero-shot relations by leveraging information from textual de-scriptions (Chen et al. 2019b; Qin et al. 2020). They performwell on predicting the zero-shot relations, but are not appro-priate in the link prediction task because textual descriptionscould be noisy and a bridge connecting seen and unseen re-lations could be missing.In this work, we focus on jointly learning many-, few-, and zero- shot relations without requiring extra textualknowledge. Recently, some computer vision works (Ye et al.2019; Shi et al. 2019) have attempted to approach the gener-alized image classification task. Nonetheless, they are notdesigned for coping with graph structures, e.g., KGs. Weleverage in this work the rich semantic correlations betweenrelations as a bridge to connect the learning of many-, few-,igure 2: The illustration of GRL, which consists of the intuitive explanation (a) , the base model (b) and the detailed architecture (c) . The base model denotes the mainstream link prediction model. GRL is plugged after the embedding component of the basemodel and contains three components: attention, fusion, and classifier.and zero- shot relations. Zhang et al. (2019) integrated therich semantic correlations between specific hierarchical re-lations into relation extraction. That method however per-forms well only on hierarchical relations, as well as it pre-dicts relations from text, hence it does not cope with the linkprediction task.
Method
Figure 2 provides the illustration of the proposed frameworkGRL. The figure consists of three parts: the intuitive expla-nation of GRL in Figure 2 (a), base model shown in Figure 2(b), and the detailed architecture in Figure 2 (c).The intuitive explanation of GRL is shown to utilize thesemantic correlations between many-shot and few-shot re-lations so that the relation embedding learning can benefitfrom semantically similar relations. We devise three mod-ules, i.e.,
Attention , Fusion and
Classifier , to embed and fusethe rich semantic correlations among many-shot and few-shot relations in the training phase; and to select the mostsimilar relation embedding for zero-shot relations in the test-ing phase. In this way, GRL can improve the performance onall relation classes, i.e., many, few, and zero-shot relations.The base model denotes the existing mainstream link pre-diction model consisting of an embedding component anda score function component. GRL can be plugged betweenthe embedding and the score function components to makeit (1) insensitive to imbalanced relation distributions and (2)capable of detecting zero-shot relations.Before delving into the model description, we first for-mally represent a KG as a collection of triples T = { ( e h , r, e t ) | e h ∈ E , e t ∈ E , r ∈ R } , where E and R are theentity and relation sets, respectively. Each directed link inKG represents a triple (i.e., e h and e t are represented as thenodes and r as the labeled edge between them). The linkprediction task is to predict whether a given triple ( e h , r, e t ) is valid or not. In particular, for the zero-shot relations, weneed to emphasize that we mainly focus on predicting the validity of the triple with a zero-shot relation, rather thanpredicting the zero-shot relations, i.e., the relation predictiontask (Chen et al. 2019b; Qin et al. 2020). However, GRL hasalso the ability to predict the most semantically similar rela-tion of a given zero-shot relation through learning from themany- and few-shot relations, not from the text description. Base Model
We select a mainstream link prediction model as the basemodel and apply GRL to it. The base model can be seen asmulti-layer neural network consisting of an embedding com-ponent and a score function component. For the base linkprediction model, given an input triple ( e h , r, e t ) , the em-bedding component maps the head and tail entities ( e h , e t ) and the relation r to their distributed embedding representa-tions ( e h , r , e t ) through the entity and relation embeddinglayers, respectively. After the embedding representations areobtained, the score function component is adopted to cal-culate the likelihood of ( e h , r , e t ) being a valid fact. Thefollowing binary cross entropy loss is used to train modelparameters: L s = − N N (cid:88) i =1 ( t i log p ( s i ) + (1 − t i ) log(1 − p ( s i ))) , (1)where s i is the score of the i -th input triple, t i is the groundtruth label, t i is 1 if the input relation is valid and 0 other-wise, and N is the number of input triples. GRL Framework
The loss used by mainstream link prediction models is score-oriented and lacks an in-depth exploration of rich seman-tic correlations in KGs. We propose the GRL framework tolearn appropriate representations for relations by embeddingsemantic correlations into classification-aware optimization.GRL contains three specific modules:)
Attention
Module, which builds the knowledge-awareattention distribution and the relational knowledge vector.The aim of this module is to extract the semantic correlationsand the degree of these correlations.2)
Fusion
Module, which fuses the relational knowledgevector with the joint vector obtained from the attention mod-ule. This module realizes the fusion of different relations,according to semantic correlations.3)
Classifier
Module, which calculates the classification-aware loss to implicitly enable the rich semantic correlationsembedded in the embeddings. Thanks to it, both the com-pactness of semantically similar relations and discriminationof dissimilar relations can be enhanced.The following is a detailed introduction to each module.
Attention Module
Joint Block.
The classification-aware loss is calculated bythe relation classification results based on the head and tailentities from the given triple ( e h , r, e t ) . Inspired by (Qinet al. 2020), the joint vector of the head and tail entities hasthe ability to represent the potential relation between them.The head and tail entities representations (i.e., e h and e t ) arejointed together at the joint block for which we adopt threedifferent alternatives: j = (cid:40) e h − e t , sub e h ⊗ e t , multiplyW [ e h ; e t ] + b , concat , (2)where ⊗ denotes the element-wise multiplication operator,and W j and b j are the learnable parameters. Relation Memory Block.
Using a memory block to storeclass information is widely used in image classification(Snell, Swersky, and Zemel 2017; Karlinsky et al. 2019; Liuet al. 2019). Following these studies, we design a relationmemory block to store all relation information by sharingparameters with the relation embedding layer as M = { r , r , ..., r K − , r K } , (3)where M ∈ R K × dim , K is the number of relation classes.As the training progresses, the relation embedding layer andrelation memory block are updated synchronously. Relational Knowledge.
To realize the classification-awareoptimization objective, we extract useful relational knowl-edge from the relation memory block to enrich the joint vec-tor. The semantic correlation degree between different rela-tions may vary; thus, we adopt the attention mechanism tocustomize specific relational knowledge for each joint vec-tor. Concretely, the relational knowledge vector rk is com-puted as a weighted sum of each relation representation inthe relation memory block M , i.e., rk = α sim M , where α sim ∈ R K represents the knowledge-aware attention dis-tribution. Attention Distribution
The knowledge-aware attentiondistribution α sim describes the similarity between the jointvector and each relation representation in the relation mem-ory block. We estimate α sim as α sim = sof tmax ( jM (cid:62) ) , (4)where sof tmax is the activation function, and M (cid:62) repre-sents the transposed matrix of M . Note that the attentionvalue of the ground-truth relation is masked with 0. Table 1: Statistics of datasets. | E | and | R | represent the car-dinalities of the entity and relation sets. | E | | R | Train Valid TestYAGO3-10 123k 37 1M 5k 5kFB15K-237 15k 237 273k 18k 20kNELL-995 75k 200 150k 543 4kKinship 104 25 9k 1k 1kWN18 41k 18 141k 5k 5kNELL-ONE 69k 358 190k 1k 2k
Fusion Module
In this module, the joint vector and relational knowledgevector are fused. Intuitively, the proportion of fusion is dif-ferent for each joint vector. Inspired by the pointer-generatornetwork (See, Liu, and Manning 2017) that facilitates copy-ing words from the source text during new words genera-tion, we propose a soft switch, that is, the fusion probabil-ity p f ∈ [0 , , to adaptively adjust the fusion proportion be-tween the joint vector and relational knowledge vector. Thefusion probability p f is estimated according to the joint vec-tor as p f = sigmoid ( F C ( j )) , where F C is the fully con-nected neural network, and sigmoid is the activation func-tion. Finally, we obtain the following fusion vector f overthe joint vector j and relational knowledge vector rk as f = (1 − p f ) j + p f rk . (5) Classifier Module
Classification-aware Loss.
The fusion vector f is mappedto a class probability through the classifier block as D ∼ sof tmax ( f (cid:62) W c ) , (6)where W c ∈ R dim × K is the classification weight matrix,and sof tmax is the activation function.Given the ground truth relation r i from the i -th in-put ( e h i , r i , e t i ) , we adopt cross entropy to assess theclassification-aware loss as L c = − N N (cid:88) i =1 log p ( r i | ( e h i , e t i )) , (7)where p ( r i | ( e h i , e t i )) ∈ D i is the probability of the groundtruth relation r i . Most Similar Relation.
Existing mainstream link pre-diction models have achieved impressive performance, yetthey can only learn the patterns observed in the closeddatasets, thereby limiting their scalability for handling therapidly evolving KGs. Specifically, when a zero-shot re-lation r z (i.e., one not existing in the training set) oc-curs between an entity pair ( e h , e t ) , it is almost impossi-ble for the existing models to distinguish whether this newtriple ( e h , r u , e t ) is valid or not. All r z will be then identifiedas an ‘unknown‘ vector u by the embedding component, andthe newly constructed triple representation ( e h , u , e t ) willreceive a low score. To alleviate this defect, GRL selects themost semantically similar relation for replacing to enhancethe learning ability of base model on zero-shot relations. Weargue that the relation which corresponds to the maximumsimilarity in α sim reflects the semantic relation of two enti-ties in the best way. Therefore, we use the vector of the mostsimilar relation r ms to replace the vector u and evaluate thenewly constructed triple representation ( e h , r ms , e t ) .able 2: Link prediction results (mean ± sd) of the compared models (%): the best results are marked in bold (pairwise t-test at5% significance level). YAGO3-10 FB15K-237 NELL-995 Kinship WN18MRR HITS@N MRR HITS@N MRR HITS@N MRR HITS@N MRR HITS@N@10 @1 @10 @1 @10 @1 @10 @1 @10 @1ComplEx 36.0 55.0 26.0 24.7 42.8 15.8 48.2 60.6 39.9 82.3 97.1 73.3 94.1 94.7 93.6R-GCN - - - 24.8 41.7 15.3 12.0 18.8 8.2 10.9 23.9 3.0 81.4 ± sd) ( ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ConvE 52.0 66.0 45.0 31.6 49.1 23.9 49.1 61.3 40.3 83.3 ( ± sd) ( ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± Learning Scheme
We follow the definition of score-aware loss in existing basemodels and propose a classification-aware loss to train themodel. The overall optimization follows the joint learningparadigm that is defined as a weighted combination of con-stituent losses as L = L s + λ L c , where λ is a hyper-parameter to balance the importance between the score-aware loss and classification-aware loss for optimization. Experiments and Results
Datasets
We select two categories of datasets to comprehensivelyevaluate GRL as follows, whose statistical descriptions areshown in Table 1: • Imbalanced datasets : YAGO3-10 (Mahdisoltani, Biega,and Suchanek 2015), FB15K-237 (Toutanova and Chen2015), NELL-995 (Xiong, Hoang, and Wang 2017), Kin-ship (Lin, Socher, and Xiong 2018), and WN18 (Bordeset al. 2013). These datasets contain both many-shot andfew-shot relations. • Few-shot dataset : NELL-ONE (Xiong et al. 2018), whichis specially constructed for the few-shot learning task inKG. The relations with less than 500 but more than 50 train-ing triples are selected as testing data.
Baselines
We adopt two embedding-based models, DistMult (Yanget al. 2015) and ConvE (Dettmers et al. 2018), as thebase models of our proposed modules, and compare thetwo enhanced models with the following popular relationprediction models: RESCAL (Nickel, Tresp, and Kriegel2011), TransE (Bordes et al. 2013), DistMult (Yang et al.2015), ComplEx (Trouillon et al. 2016), ConvE (Dettmerset al. 2018), ConvKB (Nguyen et al. 2018), D4-STE, D4-Bumbel (Xu and Li 2019), and TuckER (Balazevic, Allen,and Hospedales 2019). Besides the above general models,we test two additional models, GMatching (Xiong et al.2018) and CogKR (Du et al. 2019), which are designedspecifically for the few-shot relation learning.
Experimental Configuration
We implement the base models and our proposed two mod-ules in PyTorch (Paszke et al. 2017) . Throughout the ex-periments, we optimize the hyperparameters in a grid searchsetting for the best mean reciprocal rank (MRR) on the vali-dation set. We use Adam to optimize all the parameters withinitial learning rate at 0.003. The dimensions of entity andrelation embeddings are both set to 200. The loss weight λ is set to 0.1. According to the frequency of relations, wetake the top 20% and bottom 80% of relations as many-shotand few-shot relation classes, respectively. The experimen-tal results of our model are averaged across three trainingrepetitions, and standard deviations (sd) are also reported. Experiment I: Link Prediction
Setting
We follow the evaluation protocol of (Dettmerset al. 2018): each input ( e h , r, e t ) is converted to twoqueries, that is, tail query ( e h , r, ?) and head query (? , r, e t ) ;then, the ranks of correct entities are recorded among allentities for each query, excluding other correct entities thatwere observed in any of the train/valid/test sets for the samequery. We use the filtered HITS@1, 5, 10, and MRR as eval-uation metrics. Results
Table 2 records the results on five imbalanceddatasets, which reflect the general performance of the com-pared models in solving the link prediction task. It showsthat two base models (DistMult and ConvE) are generallyimproved by incorporating the proposed GRL framework.That is, GRL improves DistMult by an average of 3.84% andimproves ConvE by an average of 1.08% under the MRRevaluation. Especially, the enhanced model ConvE+GRLgenerally outperforms the compared models on YAGO3-10,FB15K-237, Kinship, and WN18, and the enhanced modelDistMult+GRL also performs well on NELL-995. We alsoevaluate the performance of GRL in learning many-shot andfew-shot relations and show the MRR results of DistMult,DistMult+GRL, ConvE, and ConvE+GRL on YAGO3-10and NELL-995 (c.f. Table 3). The results indicate that GRLachieves consistent improvements on both “many-shot” andable 3: Link prediction results with the increment (%) onmany-shot and few-shot sub-groups, and entire test set.
YAGO3-10 NELL-995Many Few All Many Few AllDistMult 38.1 26.7 34.0 52.6 41.9 48.5DistMult+GRL 44.8 34.2 41.2 57.3 48.8 54.3(Increment) ( ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ Table 4: Few-shot relation learning results (mean ± sd) onNELL-ONE dataset (%): the results marked by ‘ † ’ or ‘ ∗ ’ aretaken from (Xiong et al. 2018; Du et al. 2019). MRR HITS@N@10 @5 @1TransE † † ∗ DistMult † ± sd) ( ± ± ± ± ∗ ± sd) ( ± ± ± ± “few-shot” sub-groups. We assume this may be because han-dling many-shot relations can be improved thanks to usefulimplicit information from few-shot relations, even thoughthere are already numerous training samples for many-shotrelations. From this aspect, it is sensible for the mainstreamlink prediction models to rely on GRL regarding the imbal-anced relation issue. Experiment II: Few-shot Relation Learning
Setting
To further evaluate the performance of GRL in thefew-shot relation learning case, which is tricky for a linkprediction model, especially, when relations are very insuffi-cient, we test approaches on the NELL-ONE dataset whereineach test relation has only one instance in the training set.We follow the evaluation protocol and metrics of (Xionget al. 2018).
Results
Table 4 shows that GRL consistently improvesboth of the base models by average 4.2% and 8.6% MRRscores. Especially for ConvE, incorporating GRL helps itoutperform the other approaches on three metrics. CogKR,a path learning based model, performs best under [email protected] reason might be that the testing query is easy to be com-pleted by finding KG paths on the few-shot relation datasets,such as NELL-ONE. Although there is only one training in-stance for each testing query, GRL can effectively embed thefew-shot relations by learning from the semantically similar Figure 3: Zero-shot relation learning results: (a) the averagescore of the testing triples, and (b) the average similarity be-tween the zero-shot relation with the predicted relation.relations in the many-shot class.
Experiment III: Zero-shot Relation Learning
Setting
To evaluate the performance on zero-shot relationsof GRL, we construct a testing set containing 500 tripleswhose relations are unseen in the training phase. The testingtriples are randomly sampled from the FB15K dataset (Bor-des et al. 2013), and the training set is FB15K-237 to en-sure the authenticity of the triples. We adopt the fundamen-tal testing protocol that quantitatively determines the scoresof triples with zero-shot relations.Most of existing zero-shot relation studies have to de-pend on textual descriptions, while the zero-shot learningaddressed in this work does not require this information.Therefore, we select the GMatching model (Xiong et al.2018) for comparison, which can predict similar relationsby learning a matching metric without any additional infor-mation. We use the classical method TransE (Bordes et al.2013) to learn the relation embeddings in the FB15K datasetand calculate the similarity between the zero-shot relationand the predicted relation.
Results
Figure 3 (a) demonstrates results of the averagescore of the testing triples with zero-shot relations. Notethat we use the fusion vector as the zero-shot relationsembedding. We can see that two base models (DistMultand ConvE) cannot get a good average score because allzero-shot relations will be identified as an ‘unknown’ rela-tion. When GRL is plugged, two enhanced models (Dist-Mult+GRL and ConvE+GRL) are both boosted in learningpositive relations, proving that the GRL framework can ef-fectively improve the validation capabilities on triples withzero-shot relations of the base models. Figure 3 (b) showsthe performance on predicting zero-shot relations. We cansee that the base models perform worse due to their superfi-cial way of embedding zero-shot relations as mentioned be-fore. When equipping with GRL, the enhanced models per-form better than GMatching, indicating that learning fromthe semantic correlations between unseen relations and seenrelations provides a comparably good way as learning fromneighbor information.
Further Analysis of GRL
Ablation Study
Study of Fusion Probability
To assess the effect of the fu-sion vector, we make a comparison on three variants fromable 5: Ablation Study.
YAGO3-10 NELL-ONE(1) ConvE 52.0 17.0(2) ConvE+GRL( p f = 0 ) 52.6 23.3(3) ConvE+GRL( p f = 0 . ) 53.9 24.7(4) ConvE+GRL( p f = 1 ) 52.2 20.3(5) ConvE+Direct 51.2 10.5(6) ConvE+GRL 55.4 25.6 the fusion probability perspective based on ConvE, as shownin Table 5 (2)-(4). The three variants are the followings: onlyusing the joint vector (i.e., p f = 0 ), only using the rela-tional knowledge vector (i.e., p f = 1 ), and using the jointand relational knowledge vectors with an equal weight (i.e., p f = 0 . ). Compared with three variants, fusing the jointand relational knowledge vectors (i.e., ConvE+GRL) per-forms best, which suggests that the semantic correlations inthe relational knowledge vectors can help the base modellearn more appropriate representations of relations and thusboost the general performance. Moreover, the adaptive fu-sion probability can improve the flexibility of the fusion op-erator. Direct Fusion vs. GRL
We test now a direct fusion methodthat fuses the relational knowledge vector with the relationrepresentation as the updated relation representation withoutconsidering the classification-aware loss. Table 5 (5) showsthe MRR performance of ConvE when enhanced by the di-rect method. Rich semantic correlations in KGs cannot beadequately learned by the direct method because it simplymakes use of the superficial semantic correlations, ratherthan embedding them into relation vectors. Moreover, thedirect method will make embedding learning more confus-ing especially for the few-shot relation data such as NELL-ONE.
Case Study
Visualization of Knowledge-aware Attention
The pro-posed framework GRL is able to make the base modelfully learn semantic correlations between relations. To ver-ify this, we display the attention distribution for the basemodel (ConvE) and enhanced model (ConvE+GRL) onFB15K-237 in Figure 4, and show the average attention dis-tribution of 237 relation classes where each row representsa type of relation. The base model learns little about se-mantic correlations between relations, while the enhancedmodel (ConvE+GRL) can perfectly capture the semanticcorrelations. The attention distribution of few-shot relationsis more discrete than many-shot relations due to insufficienttraining data.
Visualization of Relation Embedding
In addition, we alsoshow in Figure 5 the t-SNE (Maaten and Hinton 2008) plotof all relations on FB15K-237 in embedding space. To pro-vide more insights we highlight the relations associated with“film”. The Stars and Triangles represent the many-shot andfew-shot relations, respectively. We can see that the many- Figure 4: Case study: knowledge-aware attention cases witha heat map.Figure 5: Case study: t-SNE visualization of relation embed-dings in FB15K-237 (better view in color). The semanticallysimilar relations get closer after plugging GRL.shot and few-shot relations are more compact in the case ofthe enhanced model than the base model .
Conclusion and Future Work
In this work, we study two natural problems in the linkprediction task: 1) unbalanced relation distribution, and 2)unseen relations. To address them, we focus on generalizedrelation learning and propose a framework, GRL, thatuses semantic correlations among relations as a bridge toconnect semantically similar relations. Through extensiveexperiments on six datasets, we demonstrate the effective-ness of GRL, providing a comprehensive insight into thegeneralized relation learning of KGs. There are a few looseends for further investigation. We will consider combiningthe external text information and the semantic knowledgeof KGs to facilitate the relation learning. We will also tryto deploy GRL to downstream applications that involvegeneralized relation learning scenarios to gain more insights.
Acknowledge
This work was supported in part by the Min-istry of Education of Humanities and Social Science projectunder grant 16YJC790123 and the Natural Science Founda-tion of Shandong Province under grant ZR2019MA049.
References
Balazevic, I.; Allen, C.; and Hospedales, T. 2019. TuckER:Tensor Factorization for Knowledge Graph Completion. In
Proceedings of EMNLP-IJCNLP , 5184–5193.ordes, A.; Usunier, N.; Garcia-Dur´an, A.; Weston, J.; andYakhnenko, O. 2013. Translating Embeddings for ModelingMulti-relational Data. In
Proceedings of NeurIPS , 2787–2795.Chen, M.; Zhang, W.; Zhang, W.; Chen, Q.; and Chen, H.2019a. Meta Relational Learning for Few-Shot Link Pre-diction in Knowledge Graphs. In
Proceedings of EMNLP-IJCNLP , 4216–4225.Chen, W.; Zhu, H.; Han, X.; Liu, Z.; and Sun, M. 2019b.Quantifying Similarity between Relations with Fact Distri-bution. In
Proceedings of ACL , 2882–2894.Christmann, P.; Saha Roy, R.; Abujabal, A.; Singh, J.; andWeikum, G. 2019. Look before you Hop: ConversationalQuestion Answering over Knowledge Graphs Using Judi-cious Context Expansion. In
Proceedings of CIKM , 729–738.Dettmers, T.; Minervini, P.; Stenetorp, P.; and Riedel, S.2018. Convolutional 2D Knowledge Graph Embeddings. In
Proceedings of AAAI , 1811–1818.Du, Z.; Zhou, C.; Ding, M.; Yang, H.; and Tang, J. 2019.Cognitive Knowledge Graph Reasoning for One-shot Rela-tional Learning. arXiv preprint arXiv:1906.05489 .Ji, G.; Liu, K.; He, S.; and Zhao, J. 2016. Knowledge GraphCompletion with Adaptive Sparse Transfer Matrix. In
Pro-ceedings of AAAI , 985–991.Karlinsky, L.; Shtok, J.; Harary, S.; Schwartz, E.; Aides,A.; Feris, R.; Giryes, R.; and Bronstein, A. M. 2019. Rep-Met: Representative-Based Metric Learning for Classifica-tion and Few-Shot Object Detection. In
Proceedings ofCVPR , 5197–5206.Lin, X. V.; Socher, R.; and Xiong, C. 2018. Multi-HopKnowledge Graph Reasoning with Reward Shaping. In
Pro-ceedings of EMNLP , 3243–3253.Lin, Y.; Liu, Z.; Sun, M.; Liu, Y.; and Zhu, X. 2015. Learn-ing Entity and Relation Embeddings for Knowledge GraphCompletion. In
Proceedings of AAAI , 2181–2187.Liu, Z.; Miao, Z.; Zhan, X.; Wang, J.; Gong, B.; and Yu,S. X. 2019. Large-Scale Long-Tailed Recognition in anOpen World. In
Proceedings of CVPR , 2537–2546.Lv, X.; Gu, Y.; Han, X.; Hou, L.; Li, J.; and Liu, Z. 2019.Adapting Meta Knowledge Graph Information for Multi-Hop Reasoning over Few-Shot Relations. In
Proceedingsof EMNLP-IJCNLP , 3374–3379.Maaten, L. v. d.; and Hinton, G. 2008. Visualizing DataUsing T-SNE.
Journal of Machine Learning Research
Pro-ceedings of CIDR .Moon, S.; Shah, P.; Kumar, A.; and Subba, R. 2019.OpenDialKG: Explainable Conversational Reasoning withAttention-based Walks over Knowledge Graphs. In
Pro-ceedings of ACL , 845–854. Nguyen, D. Q.; Nguyen, T. D.; Nguyen, D. Q.; and Phung,D. 2018. A Novel Embedding Model for Knowledge BaseCompletion Based on Convolutional Neural Network. In
Proceedings of NAACL , 327–333.Nickel, M.; Tresp, V.; and Kriegel, H.-P. 2011. A Three-WayModel for Collective Learning on Multi-Relational Data. In
Proceedings of ICML , 809–816.Paszke, A.; Gross, S.; Chintala, S.; Chanan, G.; Yang, E.;DeVito, Z.; Lin, Z.; Desmaison, A.; Antiga, L.; and Lerer, A.2017. Automatic Differentiation in PyTorch. In
Proceedingsof NeurIPS Workshop .Qin, P.; Wang, X.; Chen, W.; Zhang, C.; Xu, W.; and Wang,W. Y. 2020. Generative Adversarial Zero-Shot RelationalLearning for Knowledge Graphs. In
Proceedings of AAAI ,8673–8680.Schlichtkrull, M.; Kipf, T. N.; Bloem, P.; Van Den Berg,R.; Titov, I.; and Welling, M. 2018. Modeling RelationalData with Graph Convolutional Networks. In
Proceedingsof ESWC , 593–607.See, A.; Liu, P. J.; and Manning, C. D. 2017. Get To ThePoint: Summarization with Pointer-Generator Networks. In
Proceedings of ACL , 1073–1083.Seyed, M. K.; and David, P. 2018. SimplE Embedding forLink Prediction in Knowledge Graphs. In
Proceedings ofNeurIPS , 4284–4295.Shi, B.; and Weninger, T. 2018. Open-World KnowledgeGraph Completion. In
Proceedings of AAAI , 1957–1964.Shi, X.; Salewski, L.; Schiegg, M.; Akata, Z.; and Welling,M. 2019. Relational Generalized Few-Shot Learning. arXivpreprint arXiv:1907.09557 .Snell, J.; Swersky, K.; and Zemel, R. 2017. Prototypical Net-works for Few-shot Learning. In
Proceedings of NeurIPS ,4077–4087.Toutanova, K.; and Chen, D. 2015. Observed Versus LatentFeatures for Knowledge Base and Text Inference. In
Pro-ceedings of CVSC , 57–66.Trouillon, T.; Welbl, J.; Riedel, S.; Gaussier, ´E.; andBouchard, G. 2016. Complex Embeddings for Simple LinkPrediction. In
Proceedings of ICML , 2071–2080.Wang, H.; Zhang, F.; Zhao, M.; Li, W.; Xie, X.; and Guo, M.2019a. Multi-Task Feature Learning for Knowledge GraphEnhanced Recommendation. In
Proceedings of WWW ,2000–2010.Wang, Z.; Lai, K.; Li, P.; Bing, L.; and Lam, W. 2019b.Tackling Long-Tailed Relations and Uncommon Entities inKnowledge Graph Completion. In
Proceedings of EMNLP-IJCNLP , 250–260.Wang, Z.; Zhang, J.; Feng, J.; and Chen, Z. 2014. Knowl-edge Graph Embedding by Translating on Hyperplanes. In
Proceedings of AAAI , 1112–1119.Xiong, W.; Hoang, T.; and Wang, W. Y. 2017. DeepPath:A Reinforcement Learning Method for Knowledge GraphReasoning. In
Proceedings of EMNLP , 564–573.iong, W.; Yu, M.; Chang, S.; Guo, X.; and Wang, W. Y.2018. One-Shot Relational Learning for Knowledge Graphs.In
Proceedings of EMNLP , 1980–1990.Xu, C.; and Li, R. 2019. Relation Embedding with DihedralGroup in Knowledge Graph. In
Proceedings of ACL , 263–272.Yang, B.; Yih, W.-t.; He, X.; Gao, J.; and Deng, L. 2015.Embedding Entities and Relations for Learning and Infer-ence in Knowledge Bases. In
Proceedings of ICLR .Ye, H.-J.; Hu, H.; Zhan, D.-C.; and Sha, F. 2019. Learn-ing Classifier Synthesis for Generalized Few-Shot Learning. arXiv preprint arXiv:1906.02944 .Zhang, N.; Deng, S.; Sun, Z.; Wang, G.; Chen, X.; Zhang,W.; and Chen, H. 2019. Long-tail Relation Extraction viaKnowledge Graph Embeddings and Graph Convolution Net-works. In