MöbiusE: Knowledge Graph Embedding on Möbius Ring
Yao Chen, Jiangang Liu, Zhe Zhang, Shiping Wen, Wenjun Xiong
MM ¨obiusE: Knowledge Graph Embedding on M ¨obius Ring
Yao Chen a, ∗ , Jiangang Liu a , Zhe Zhang a , Shiping Wen b , Wenjun Xiong a a Department of Computer Science, Southwestern University of Finance and Economics, China b Centre for Artificial Intelligence, University of Technology Sydney, Sydney, Australia
Abstract
In this work, we propose a novel Knowledge Graph Embedding (KGE) strategy, called M¨obiusE, in which the entities and relationsare embedded to the surface of a M¨obius ring. The proposition of such a strategy is inspired by the classic TorusE, in whichthe addition of two arbitrary elements is subject to a modulus operation. In this sense, TorusE naturally guarantees the criticalboundedness of embedding vectors in KGE. However, the nonlinear property of addition operation on Torus ring is uniquelyderived by the modulus operation, which in some extent restricts the expressiveness of TorusE. As a further generalization ofTorusE, M¨obiusE also uses modulus operation to preserve the closeness of addition operation on it, but the coordinates on M¨obiusring interacts with each other in the following way: any vector on the surface of a M¨obius ring moves along its parametric tracewill goes to the right opposite direction after a cycle . Hence, M¨obiusE assumes much more nonlinear representativeness than thatof TorusE, and in turn it generates much more precise embedding results. In our experiments, M¨obiusE outperforms TorusE andother classic embedding strategies in several key indicators.
Keywords:
M¨obius ring, Torus ring, Knowledge graph, Embedding
1. Introduction
Graph or network structure can usually be used to model theintrinsic information behind data [1]. As an typical example, aknowledge graph (KG) is a collection of facts of the real world,related examples include DBpedia [2], YAGO [3] and Freebase[4]. These built KG database can be applied in many realisticengineering tasks, such as knowledge inference, question an-swering, and sample labeling. Driven by realistic applications,an intriguing and challenging problem for KG is: how to in-fer those unknown knowledge (or missing data) via the existingbuilt KG database.For any KG, the basic element of knowledge is representedby a triplet ( h , r , t ), which composed of two entities h , t and one ∗ Corresponding author
Email address: [email protected] (Yao Chen) relation r . In detail, h is called the head, r is called the tail,and ( h , r , t ) means head h maps to tail t via the operation of re-lation r . For example, the entity h = U.S.A projects to t = DonaldTrump under the relation h = ThePresidentOf ,and the entity h = U.S.A projects to t = MikePence underthe relation h = TheVicePresidentOf . Based on these twotriplets, and considering the fact of “the president and vice pres-ident of a country at the same time are colleagues”, one predictsthe existence of the triplet (
DonaldTrump , IsColleagueOf , MikePence ) if such triplet is not contained in the given database.For realistic applications such as question answering, the coretask is to predict those triplets which are not listed in the givendatabase, such as task is called link prediction in KG.There exists many types of method for link prediction inKG, among which translation-based methods treats each en-
Preprint submitted to Journal of L A TEX Templates January 8, 2021 a r X i v : . [ c s . A I] J a n ity and relation as embedded vectors in some constructed al-gebraic space, and each true triplet can be translated to a simpleconstraint described by an algebraic expression f ( h , r , t ). Us-ing such a method, if some ( h , r , t ) is not presented in KG butthe corresponding algebraic constraint index f ( h , r , t ) is verysmall, then it is reasonable to believe that ( h , r , t ) is a missingfact. Due to the choosing of different algebraic space, manydifferent translation-based models have be obtained. TransE [5]is the first translation-based model for link prediction tasks, inwhich it uses the Euclidean space R n as the embedding spaceand the constraint quantity for each triplet is set as simple as (cid:107) h + r − t (cid:107) . TransR [6] builds the constraint equation by map-ping the entities to a subspace derived by the relation vector.TansH [7] improves TransE in dealing with reflexive/ONE-TO-MANY/MANY-TO-ONE/MANY-TO-MANY cases in KG, suchan improvement is caused by treating each relation as a trans-lating operation on a hyperplane. DistMult [8] utilizes a sim-ple bilinear formulation to model the constraint generated bytriplets, in this way the composition of relation is characterizedby matrix multiplication, which makes such a model good atcapturing relational semantics. ComplEx [9] uses the standarddot product between embeddings and demonstrates the capabil-ity of complex-valued embeddings instead of real-valued num-bers. TorusE [10] chooses a compact Lie group, a torus, as theembedding space, and is more scalable to large-size knowledgegraphs because of its lower computational complexity. Othertypes of embedding methods include TransA [11], a locally andtemporally adaptive translation-based approach; ConvE [12], amulti-layer convolutional network embedding model; TransF,an embedding strategy with flexibility in each triplet constraint;QuaternionE [13], an embedding method treats each relation asa quaternion in hypercomplex space.Among the above embedding methods, the basic operationbehind TorusE is a simple modulus operation and this opera- tion automatically regularizes the calculated results to the givenbounded area. In this way, there is no need to make an ex-tra regularization operation in TorusE. Inspired by this propertyof TorusE, we move beyond one-dimensional modulus opera-tion on Lie group and propose M¨obiusE, which takes a M¨obiusring (whose basic dimension is two) as the embedding space.This M¨obiusE has the following major benefits for embeddingtasks: At first, M¨obiusE provides much more flexibility sinceeach point on the surface of a M¨obius ring has two differentexpressions; Second, the expressiveness of M¨obiusE is muchgreater than TorusE due to the complex distance function builtin M¨obius ring; Third, M¨obiusE also automatically truncatesthe training vectors to constraint space as that of TorusE; Fi-nally, M¨obiusE subsumes TorusE, and inherits all the attrac-tive properties of TorusE, such as its ability to model symme-try/antisymmetry, inversion, and composition.The rest of the paper is organized as follows: Section 2 re-visits the idea of TorusE and proposes the basic idea of M¨obiusE,especially, we discuss about how to define the distance functionon a M¨obius ring. Section 3 provides the experiment resultsfor M¨obiusE on datasets FB15K and WN18. Section 4 discussabout related works and compare M¨obiusE with other embed-ding methods. Section 5 concludes this paper. In the end, Sec-tion 6 lists the proof for several key propositions on M¨obiusring.
2. Embedding on M¨obius Ring
A KG is described by a set of triples ∆ , where each l = ( h , r , t ) ∈ ∆ contains h , r , and t as head entity, relation, andtail entity, respectively. Denote E = { h , t | ( h , r , t ) ∈ ∆ } and R = { r | ( h , r , t ) ∈ ∆ } as the set of entities and relations of ∆ . Let f be the scoring function, the task of KGE is to find vectors h , r , t || h + r − t || h , r , t ∈ R n TransH [7] || h − w T r hw r + d r − ( t − w T r tw r ) || h , t , d r , w r ∈ R n RESCAL [14] h T M r t (cid:107) h (cid:107) ≤ (cid:107) t (cid:107) ≤ (cid:107) M r (cid:107) F ≤ || h + r − t || T n h , r , t ∈ T n Distmult [8] − h T diag ( r ) t h , r , t ∈ R n ComplEx [9] − Re( h T diag ( r ) ¯t ) h , r , t ∈ C n M¨obiusE (ours) dist( h ⊕ r , t ) h , r , t ∈ M q / pn Table 1: Scoring Functions of Typical KGE Models corresponding to h , r , t which minimize: I = (cid:88) l = ( h , r , t ) ∈ ∆ (cid:88) ( h (cid:48) , r , t (cid:48) ) ∈ ∆ (cid:48) l [ γ + f ( h , r , t ) − f ( h (cid:48) , r , t (cid:48) )] + , (1)where H l = { ( e , r , t ) | e ∈ E , e (cid:44) h } , l = ( h , r , t ), (2) T l = { ( h , r , e ) | e ∈ E , e (cid:44) t } , l = ( h , r , t ), (3) ∆ (cid:48) l = H l ∪ T l − ∆ , (4)and l = ( h , r , t ) ∈ ∆ , γ is a margin hyperparameter. The function[ · ] + is defined by [ x ] + = max(0, x ) for x ∈ R .Suppose l = ( h , r , t ) be a positive triple, the trueness of l assessed by f is given byrank l = The rank of f ( l ) in the sequence of { f ( l ) } ∪ f ( ∆ (cid:48) l ) in ascending order. (5)In general, we use the value rank l to evaluate the trueness of acandidate triplet.In order to begin with the following description, we givetwo critical functions m k ( · ) and d k ( · ).For any u ∈ R and k ∈ Z + , m k ( u ) is defined as m k ( u ) ∈ [0, k ) and u ≡ m k ( u ) mod k . (6) m ( u ) is equivalent to [ u ] in [10]. d k ( u ) is defined as d k ( u ) = min( m k ( u ), k − m k ( u )). (7)Based on these definitions, for any u ∈ R , it is not difficult toverify that m k ( u ) = m k ( u ± k ), m k ( − u ) = k − m k ( u ), d k ( − u ) = d k ( u ), and d k ( u ) = d k ( u ± k ). In particular, d k ( u ) can be refor-mulated as d k ( u ) = min i ∈ Z | u + ik | . T Before introducing T , we would like review the definitionof Torus ring T . As shown in Fig. 1, T is a one-dimensionalring. In particular, any two points A and B on T can be uniquelydefined by their angles θ = π x and ω = π y with x , y ∈ [0, 1),these variables are all depicted in Fig. 1.The addition operation ⊕ of any two points A and B is de-fined by x ⊕ y = m ( x + y ). The distance between A and B canbe viewed as the minimal value among the rotating angles from A to B and from B to A , i.e., arc AB and arc BA as shown in Fig.1. Describing such an idea in math, we havedist( x , y ) = min { m ( x − y ), m ( y − x ) } = d ( x − y ).This distance function (and all the following distance functions)satisfies dist(x, x) = = dist(y, x) ≥ + y, z) ≤ dist(x, z) + dist(y, z), (8)3 = π x ω = π y arc AB arc BA AB O
Figure 1: Illustration of Torus ring T . where x , y , z ∈ [0, 1).Torus ring T can be viewed as the independent stacking oftwo Torus ring T . Given x = ( x , x ), y = ( y , y ) ∈ [0, 1) × [0, 1) as two points on Torus ring T , then the addition operationon T is defined by x ⊕ y = ( m ( x + y ), m ( x + y )). Thedistance function on T is given asdist(x, y) = (cid:107) (cid:122) (cid:107) τ , (cid:122) = (dist(x , y ), dist(x , y )),where (cid:107) · (cid:107) τ is some vector norm.Torus ring T can be extended to n -dimensional case, let x = ( x i ), y = ( y i ) ∈ [0, 1) × n as two points on T n , it holds x ⊕ y = ( m ( x + y ), · · · , m ( x n + y n )), (9)dist(x, y) = (cid:107) (cid:122) (cid:107) τ , , (10)where (cid:122) = ( d ( x − y ), d ( x − y ), · · · , d ( x n − y n )).Summarize the above discussion, the objective function (1)becomes TorusE when the scoring function f ( · ) is set as f ( h , r , t ) = dist( h ⊕ r , t ), h , r , t ∈ T n ,where dist( · ) is given by (10) and ⊕ is given by (9). M As an intuitive illustration of M¨obius ring, we plot the sur-face of M¨obius ring M in Fig. 3. In this figure, let the radius from the center of the hole to the center of the tube be R , theradius of the tube be r , then the parametric equation for M¨obiusring is x ( θ , ω ) = ( R + r cos( θ + ω )) cos θ , y ( θ , ω ) = ( R + r cos( θ + ω )) sin θ , z ( θ , ω ) = r sin( θ + ω ). (11)Letting θ = π x and ω = π x , then any point on M can beuniquely defined by ( x , x ). In particular, the period of x is and the period of x is
1, hence we set ( x , x ) ∈ [0, 2) × [0, 1).Given x = ( x , x ), y = ( y , y ) ∈ [0, 2) × [0, 1) as two pointson a M¨obius ring, the addition operation ⊕ on M is defined by x ⊕ y = ( m ( x + y ), m ( x + y )), (12)i.e., the addition on M is formulated by the modulus additionon each dimension with modulus 2 and 1, respectively.The distance function dist( · ) between x and y is defined asdist( x , y ) = min( w , w ), (13) w = d ( y − x ) + d ( y − x ), (14) w = d ( y − x + + d (cid:16) y − x + (cid:17) . (15)One can further verify that the above definition follows the ba-sic properties in (8). Moreover, the above distance satisfies: Proposition 1.
For distance defined in (13) for M and anytwo points x = ( x , x ) , y = ( y , y ) in [0, 2) × [0, 1) , it holds ≤ dist( x , y ) ≤ , where the upperbound achieves whenx − y = k ± and x − y = k (cid:48) ± for some integer k and k (cid:48) . We give the proof of
Proposition x = ( x , x ), y = ( y , y ) ∈ M and the correspondingpoints A and B on M , transforming A to B (or, B to A ) needsone of the following two operations:4). If transform A to B , then add y − x + k to the firstcomponent of x , and add y − x + k (cid:48) to the second com-ponent of x for some integer k and k (cid:48) . If transform B to A , then add x − y + k to the first component of y , andadd x − y + k (cid:48) to the second component of y for someinteger k and k (cid:48) . Hence, when we measure the distancein the first dimension using modulus 2 and consideringthat x , y ∈ [0, 2), such a distance is d ( y − x ). Simi-larly, if we measure the distance between x and y in thesecond dimension using modulus 1 and considering that x , y ∈ [0, 1), the obtained distance is d ( y − x ). Byadding the distance in the two dimensions, we simply ob-tain the distance in this case be d ( y − x ) + d ( y − x ),which conforms with (14).b). If transform A to B , then add y − x + k + x , and add y − x + k (cid:48) + to the secondcomponent of x for some integer k and k (cid:48) . If transform B to A , then add x − y + k + y , and add x − y + k (cid:48) + to the second component of y for some integer k and k (cid:48) . In order to make an intuitivecalculation, we calculate the value of θ + ω and θ for A in (11) as θ + ω = π x + π x , θ = π x ,when add y − x + k + x , and add y − x + k (cid:48) + to x , it becomes θ = π y + (4 k + π , θ + ω = π ( y + k + + π ( y + k (cid:48) +
12 ) = π y + π y + k + k (cid:48) + π ,which coincides with the coordinate of B on M (ignoremultiple of 2 π difference). Similar calculation can be conducted for the case of transforming B to A and hencewe omit the details. To sum it up, when we measure thedistance in the first dimension using modulus 2 and con-sidering that x , y ∈ [0, 2), such a distance is d ( y − x + x and y in the second dimension using modulus 1 andconsidering that x , y ∈ [0, 1), the obtained distance is d ( y − x + ). By adding the distance in the two di-mensions, we simply obtain the distance in this case be d ( y − x + + d ( y − x + ), which conforms with(15).Summarize the above discussion, the distance between x , y ∈ M should be the minimum of the two cases, which is equiva-lent to (13).In fact, M¨obius ring M degenerates to Torus ring in somespecial case, as shown in the following proposition, whose proofis given in Appendix 6.3. Proposition 2.
If the first dimension of M is set to zero, then M is equivalent to T . For comparison between Torus ring and M¨obius ring, welist the parametric equations for Torus ring T as: x ( θ , ω ) = ( R + r cos ω ) cos θ , y ( θ , ω ) = ( R + r cos ω ) sin θ , z ( θ , ω ) = r sin ω . (16)As we can see in (16), the period of θ and ω are both 2 π ,hence the following points on T are equivalent:( θ , ω ) ⇔ ( θ + k π , ω + k (cid:48) π ), k , k (cid:48) ∈ Z .However, in M¨obius (11), the period of θ and ω are 4 π and 2 π respectively. In particular, the following points can be viewed5 igure 2: Parametirc curve on Torus: ω = π in (16)Figure 3: Parametirc curve on Mobius: ω = as equivalent points on M¨obius ring:( θ , ω ) ⇔ ( θ + k π , ω + k (cid:48) π ) ⇔ ( θ + (4 k + π , ω + (2 k (cid:48) + π ), k , k (cid:48) ∈ Z ,which can be verified in (11).As shown in Fig. 2 and Fig. 3, the parametric curve onTorus ring and M¨obius ring are also quite different: fixing ω in(16) generates a cycle, but fixing ω in (11) generates a twistedcycle. M q / p The simple M¨obius ring M can be extended to more gen-eral case M q / p . M¨obius ring M q / p is a space constructed on[0, q ) × [0, p ), where p , q are co-prime positive integers , any x , y ∈ M q / p satisfies: x ⊕ y = ( m q ( x + y ), m p ( x + y )), (17)dist(x, y) = pq − min j = (cid:110) d q ( y − x + p j ) + d p ( y − x + q j ) (cid:111) . (18) The reason of why dist(x, y) is defined as (21) is similar to thatof (13) and hence omitted here. The distance dist( · , · ) satisfiesproperties (8) too, and in particular there is: Proposition 3.
The distance defined in (21) for M q / p satisfies dist( x , y ) ∈ (cid:104)
0, 12 p + q (cid:17) , where the upperbound achieves when ( x , y ) = (cid:0) p , q (cid:1) . The proof of the above proposition is similar to that of
Propo-sition + y, z) ≤ dist(x, z) + dist(y, z) in M q / p in (8) is given in Appendix6.1.In order to increase the embedding dimensions for KGE, wedefine M q / pn = M q / p (cid:93) M q / p (cid:93) · · · (cid:93) M q / p (cid:124) (cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32) (cid:123)(cid:122) (cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32) (cid:125) n − (cid:93) (19)as the direct sum of n M¨obius ring M q / p . For each u = ( u , u , · · · , u n ), v = ( v , v , · · · v n ) ∈ M q / pn , there is v i , u i ∈ M q / p for each1 ≤ i ≤ n and u ⊕ v = ( u ⊕ v , · · · , u n ⊕ v n ), (20)dist( u , v ) = (cid:107) (cid:122) (cid:107) τ , (21)where (cid:122) = (dist(u , v ), · · · , dist(u n , v n )) and (cid:107) · (cid:107) τ is some vec-tor norm. For embedding on M¨obius ring, the scoring function f ( h , r , t ) in (1) will be set asf ( h , r , t ) = dist( h ⊕ r , t ), h , r , t ∈ M q / pn , where the operation ⊕ is defined in (20) and distance functionis defined in (21) , which is listed in Tab. 1.In what follows, we list a key proposition for M¨obius ring M q / p , whose proof is intuitive and hence omitted here.6 roposition 4. The solutions of the following equation dist( x , 0) = in the region [0, q ) × [0, p ) for M q / p are x = ( iq / p , jp / q ) with ≤ i < p and ≤ j < q. In the next, we call the solutions of equation (22) be zeropoints of M q / p .For any given KG described by set of triplets ∆ , one of the difficulties for KGE comes from the cycle structure in the givenKG data , a cycle structure in KG means the existence of entities h i and relations r i (0 ≤ i ≤ m ) such that( h i , r i , h i + ) ∈ ∆ or ( h i + , r i , h i ) ∈ ∆ for 0 ≤ i ≤ m − h m , r m , h ) ∈ ∆ or ( h , r m , h m ) ∈ ∆ . In the case that thegiven set ∆ contains no cycle structure, the geometric structureof ∆ becomes a tree, and the embedding task for this KG with-out considering the negative samples can be implemented via asimple iteration strategy. However, realistic KG data contains agreat amount of logic cycles and an appropriate KGE strategyshould choose an algebraic structure which is capable of fittingthe constraint generated by tremendous number of cycles. Consider the properties in (8), the constraint of the abovecycle without entities can be described by an equation of rela-tions on M q / p :dist( γ ˆ r + γ ˆ r + · · · γ s ˆ r s , 0) =
0, (23)where γ i ∈ Z , ˆ r i ∈ { r , r , · · · , r m } .In general, the number of cycles in a given KG dataset ismuch larger than that of the relations, hence the set of equa-tions (23) in Euclidean space is hardly to have an exact solu-tion. Intuitively, since the number of zero points in M q / p in theregion [0, q ) × [0, p ) is pq, the set of equations (23) derivedby all the cycles in given KG data will have more chance to get an exact solution. However, the number of zero point in T = [0, 1) × [0, 1) is only 1 for TorusE, which is much smallerthan that of M q / p . In the above sense, M¨obiusE assumes morepowerful expressiveness than TorusE.
3. Experiments
The performance of the proposed M¨obiusE is tested on twodatasets: FB15K and WN18 [5], which are extracted from re-alistic knowledge graphs. The basic properties on these twodatasets is given in Tab. 2.We conduct the link prediction task by using the same methodreported in [5]. For each test triplet (which is a positive sam-ple), we replace its head (or tail) to generate a corrupted triplet,then the score of each corrupted triplet is calculated by the scor-ing function f ( · ), and the ranking of the test triplet is obtainedaccording to these scores, i.e., the definition of rank l in (5). Itshould be noted from (4) that the set of generated corrupted el-ements excludes the positive triplets ∆ , we call such a set be“filtered” [10], and the ranking values in our experiments areall obtained on the “filtered” set.We choose stochastic gradient descent algorithm to opti-mize the objective (1). In order to generate the negative samplein (1), we employ the “Bern” method [7] for negative samplingbecause of the datasets only contains positive triplets.To evaluate our model, we use Mean Rank (MR), MeanReciprocal Rank (MRR) and HIT@ m as evaluating indicators[5]. MR is calculated by mean l ∈ ∆ (rank l ) with ∆ be the set oftriplet data and rank l is defined in (5), MRR is calculated bymean l ∈ ∆ (1 / rank l ), HIT@ m is defined by |{ l ∈ ∆ : rank l ≤ m }| / | ∆ | .We conduct a grid search to find a set of optimal hyper-parameters for each dataset, the searching area for margin γ in (1) is { } and the searching area forlearning rate α in stochastic gradient descent is { ataset M¨obiusE(3,2) 0.767 73.87 } . Finally, the learning rate α is set to0.0005 both for WN18 and FB15K, the margin γ is set to 2, 000for WN18 and 500 for FB15K.We conduct our experiments on the augmented M¨obius ring M q / pn as defined in (19), and we choose three different types ofM¨obisu ring for embedding, i.e., M / n , M / n , M / n . In choos-ing the distance function in M q / pn , we choose (cid:107) · (cid:107) τ = (cid:107) · (cid:107) indist( · ) in (21). We call the above corresponding M¨obiusE beM¨obiusE(2,1), M¨obiusE(3,1), M¨obiusE(3,2) in Tab. 3 and Tab.4. The value of n in M q / pn is set to 5, 000, which means in eachembedding vector we have 10, 000 parameters to be trained.The dimensions for other types of models in Tab. 3 and Tab.4 are all set to 10, 000.As shown in Tab. 3, M¨obius(3,1) outperforms the othermodels in 4 of the 5 critical indicators. In Tab. 4, M¨obius(2,1)outperforms the other models also in 4 of the 5 critical indica-tors.
4. Related Works
The key for different types of KGE strategies is the choos-ing of embedding spaces, and in turn different addition opera-tions and different scoring functions, i.e., different f ( · ) in (1). Model MRR MR HIT@10 HIT@3 HIT@1TransE 0.414 - 0.688 0.534 0.247TransR 0.218 - 0.582 0.404 0.218RESCAL 0.890 - 0.928 0.904 0.842DistMult 0.797 655 0.946 - -ComplEx 0.941 - 0.947 0.945 0.936TorusE
M¨obiusE(3,1)
As the first work in KGE, TranE [5] uses the basic algebraicaddition to generate the score function, and regularizes the ob-tained vectors via a normalized condition. In TransE, the left(right) entity and the relation uniquely defines the right (left) en-tity. However, for realistic KG, it may have the following prop-erties: a). ONE-TO-MANY mapping, given any fixed relation r , the left (right) entity corresponding to a the right (left) entityis not unique; b): MANY-TO-ONE mapping, given any fixedentities h and t , the relation between h and t may not be unique.The original TransE cannot resolve the above two drawbacks inthe beginning.As an improvement of TransE to overcome the above draw-back a), TransR [6] maps each relation r to a vector and a ma-trix, the corresponding scoring function is derived by the imageof such a relation matrix. In this sense, different entities maybe mapped to a same vector in the image space and the draw-back a) of TransE is resolved. Similar to TransR, TransH [7]uses a single vector to obtain the image of each entities. As fur-ther generalization of TransR and TransH, TransD [15] uses adynamic matrix which is determined both by relation and enti-ties to generate the above mapping matrix. As another solutionto the drawbacks of TransE, TransM [16] introduces a relation-based adjust factor to the scoring function in TransE, such afactor will be much smaller in MANY-TO-MANY case thanthat of ONE-TO-ONE case and hence penalizing effects works.Introducing flexibility in the scoring function in some extent8nhances the generalization capability of KGE. As an imple-mentation of such an idea, TransF [17] uses a flexible scoringfunction in KGE, in which the matching degree is desribed bythe inner products of desired entities and given relation. In-creasing the complexity in the scoring function enhances thenonlinear representative capability of KGE, RESCAL [14] andDistMult [8] uses a matrix-induced inner product to representthe scoring function in KGE, and ComplEx [9] applies the sim-ilar but embeds both relation and entities to a complex-valuedspace. TransA [11] combines the idea of TransE and RESCALand incorporates the residue vector in TransE to the vector-based norm in RESCAL, such a strategy can adaptively find theloss function according to the structure of knowledge graphs.M¨obiusE is inspired from TorusE [10], however, M¨obiusEdiffers from TorusE in the following aspects: The minimal di-mension of Torus ring is 1, but the minimal dimension of M¨obiusring is 2; The addition in Torus ring chooses an unique valuefor modulus operation (generally 1), but the addition in M¨obiusring chooses different values (generally p and q in M q / p ); Thedistance function is M¨obius ring is strongly nonlinear (see, (21)),which is much more complicated than that of Torus ring. Thesemajor difference guarantees the strong expressiveness of M¨obiusE.
5. Conclusions
A novel KGE strategy is proposed by taking advantage ofthe intertwined rotating property of M¨obius ring. As a firststep, we defined the basic addition operation on M¨obius ringand discussed its related properties. Next, in order to obtainan appropriate scoring function based on M¨obius ring, we builta distance function on M¨obius ring and constructed the corre-sponding distance-induced scoring function. Finally, a com-plete KGE strategy is obtained via the above constructions, whichoutperforms several key KGE strategies in our conducted exper-iments.
6. Appendix dist(x, z) ≤ dist(x, y) + dist(y, z) in M q / p . According to the definition of dist( · , · ) on M q / p , we onlyneed to prove dist(x + y, 0) ≤ dist(x, 0) + dist(y, 0).Note that for any i , i , i (cid:48) , i (cid:48) , j , j (cid:48) ∈ Z , there is d q ( y + x + p j ) + d p ( y + x + q j ) ≤ | y + x + p j + i q | + | y + x + q j + i p | + ≤ | x + p j + p j (cid:48) + i q + i (cid:48) q | + | x + q j + q j (cid:48) + i p + i (cid:48) p | + | y − p j (cid:48) − i (cid:48) q | + | y − q j (cid:48) − i (cid:48) p | based on which and the arbitrariness of i and i , there isdist( x + y , 0) = pq − min j = (cid:40) d q ( y + x + p j ) + d p ( y + x + q j ) (cid:41) ≤ pq − min j = (cid:32) min i | x + p j + p j (cid:48) + i q + i (cid:48) q | + min i | x + q j + q j (cid:48) + i p + i (cid:48) p | (cid:33) + | y − p j (cid:48) − i (cid:48) q | + | y − q j (cid:48) − i (cid:48) p | = pq − min j = (cid:32) d q ( x + p ( j + j (cid:48) )) + d p ( x + q ( j + j (cid:48) )) (cid:33) + | y − p j (cid:48) − i (cid:48) q | + | y − q j (cid:48) − i (cid:48) p | . = dist( x , 0) + | y − p j (cid:48) − i (cid:48) q | + | y − q j (cid:48) − i (cid:48) p | .Due to the arbitrariness of j (cid:48) , i (cid:48) , i (cid:48) , there is dist( x + y , 0) ≤ dist( x , 0) + dist( y , 0). We define a function g ( α , β ) = min { g , g } with g = d ( α ) + d ( β ), g = d ( α + + d ( β + ). Based on which we know α has a period of 2 and β has a period of 1. Next, we divide the in-terval [ −
1, 0) to 4 subintervals I α , i = [ − + i , − + i ), anddivide [ − I β , j = [ − + j , − + j ) with i , j =
0, 1, 2, 3, then the value of sup x , y ∈ I α i × I β , j g canbe calculated and all these 16 values are 3 / Let x = (0, x ), y = (0, y ), then dist( x , y ) = min( d ( y − x ), 1 + d ( y − x + )) = d ( y − x ), hence M can be viewedas T by constraining the first dimension to zero. References [1] J. Lu, J. Xuan, G. Zhang, X. Luo, Structural property-aware multilayernetwork embedding for latent factor analysis, Pattern Recognition 76(2018) 228 – 241.[2] S. Auer, C. Bizer, G. Kobilarov, J. Lehmann, R. Cyganiak, Z. Ives, Db-pedia: a nucleus for a web of open data, in: Proceedings of the 6th In-ternational Semantic Web Conference and the 2nd Asian Semantic WebConference, 2007, pp. 722–735.[3] M. F. Suchanek, G. Kasneci, G. Weikum, Yago: a core of semantic knowl-edge, in: Proceedings of the 16th International Conference on WorldWide Web, 2007, pp. 697–706.[4] K. Bollacker, C. Evans, P. Paritosh, T. Sturge, J. Taylor, Freebase: acollaboratively created graph database for structuring human knowledge,in: Proceedings of the 2008 ACM SIGMOD International Conference onManagement of Data, 2008, pp. 1247–1250.[5] A. Bordes, N. Usunier, A. Garc´ıa-Dur´an, J. Weston, O. Yakhnenko,Translating embeddings for modeling multi-relational data, in: Advancesin Neural Information Processing Systems, 2013, pp. 2787–2795.[6] Y. Lin, Z. Liu, M. Sun, Y. Liu, X. Zhu, Learning entity and relation em-beddings for knowledge graph completion, in: Proceedings of the 29thAAAI Conference on Artificial Intelligence, 2015, pp. 2181–2187.[7] Z. Wang, J. Zhang, J. Feng, Z. Chen, Knowledge graph embedding bytranslating on hyperplanes, in: Proceedings of the 28th AAAI Conferenceon Artificial Intelligence, 2014, pp. 1112–1119.[8] B. Yang, W. T. Yih, X. He, J. Gao, L. Deng, Embedding entities andrelations for learning and inference in knowledge bases, in: Proceedingsof the 4th International Conference of Learning Representations, 2015,pp. 1412–1423.[9] T. Trouillon, J. Welbl, S. Riedel, E. Gaussier, G. Bouchard, Complexembeddings for simple link prediction, in: Proceedings of the 29th Inter-national Conference on Machine Learning, 2012, pp. 2071–2080. [10] T. Ebisu, R. Ichise, Toruse: Knowledge graph embedding on a lie group,in: Proceedings of the 32nd AAAI Conference on Artificial Intelligence,2018, pp. 1819–1826.[11] Y. Jia, Y. Wang, X. Jin, H. Lin, X. Cheng, Knowledge graph embed-ding: A locally and temporally adaptive translation-based approach,ACM Transactions on the Web (2018) 1559–1131.[12] T. Dettmers, P. Minervini, P. Stenetorp, S. Riedel, Convolutional 2Dknowledge graph embeddings, in: Proceedings of the 32nd AAAI Con-ference on Artificial Intelligence, 2018, pp. 1811–1818.[13] S. Zhang, Y. Tay, L. Yao, Q. Liu, Quaternion knowledge graph embed-dings, in: Advances in Neural Information Processing Systems, 2019, pp.2731–2741.[14] M. Nickel, V. Tresp, H. P. Kriegel, A three-way model for collective learn-ing on multi-relational data, in: Proceedings of the 28th InternationalConference on Machine Learning, 2011, pp. 809–816.[15] G. Ji, S. He, L. Xu, K. Liu, J. Zhao, Knowledge graph embedding viadynamic mapping matrix, in: Proceedings of the 53rd Annual Meeting ofthe Association for Computational Linguistics and the 7th InternationalJoint Conference on Natural Language Processing of the Asian Federationof Natural Language Processing, 2015, pp. 687–696.[16] M. Fan, Q. Zhou, E. Chang, T. F. Zheng, Transition-based knowledgegraph embedding with relational mapping properties, in: Proceedings ofthe 28th Pacific Asia Conference on Language, Information and Compu-tation, 2014, pp. 328–337.[17] J. Feng, M. Huang, M. Wang, M. Zhou, Y. Hao, X. Zhu, Knowlege graphembedding by flexible translation, in: Proceedings of the 15th Interna-tional Conference on Principles of Knowledge Representation and Rea-soning, 2016, pp. 557–560.[1] J. Lu, J. Xuan, G. Zhang, X. Luo, Structural property-aware multilayernetwork embedding for latent factor analysis, Pattern Recognition 76(2018) 228 – 241.[2] S. Auer, C. Bizer, G. Kobilarov, J. Lehmann, R. Cyganiak, Z. Ives, Db-pedia: a nucleus for a web of open data, in: Proceedings of the 6th In-ternational Semantic Web Conference and the 2nd Asian Semantic WebConference, 2007, pp. 722–735.[3] M. F. Suchanek, G. Kasneci, G. Weikum, Yago: a core of semantic knowl-edge, in: Proceedings of the 16th International Conference on WorldWide Web, 2007, pp. 697–706.[4] K. Bollacker, C. Evans, P. Paritosh, T. Sturge, J. Taylor, Freebase: acollaboratively created graph database for structuring human knowledge,in: Proceedings of the 2008 ACM SIGMOD International Conference onManagement of Data, 2008, pp. 1247–1250.[5] A. Bordes, N. Usunier, A. Garc´ıa-Dur´an, J. Weston, O. Yakhnenko,Translating embeddings for modeling multi-relational data, in: Advancesin Neural Information Processing Systems, 2013, pp. 2787–2795.[6] Y. Lin, Z. Liu, M. Sun, Y. Liu, X. Zhu, Learning entity and relation em-beddings for knowledge graph completion, in: Proceedings of the 29thAAAI Conference on Artificial Intelligence, 2015, pp. 2181–2187.[7] Z. Wang, J. Zhang, J. Feng, Z. Chen, Knowledge graph embedding bytranslating on hyperplanes, in: Proceedings of the 28th AAAI Conferenceon Artificial Intelligence, 2014, pp. 1112–1119.[8] B. Yang, W. T. Yih, X. He, J. Gao, L. Deng, Embedding entities andrelations for learning and inference in knowledge bases, in: Proceedingsof the 4th International Conference of Learning Representations, 2015,pp. 1412–1423.[9] T. Trouillon, J. Welbl, S. Riedel, E. Gaussier, G. Bouchard, Complexembeddings for simple link prediction, in: Proceedings of the 29th Inter-national Conference on Machine Learning, 2012, pp. 2071–2080. [10] T. Ebisu, R. Ichise, Toruse: Knowledge graph embedding on a lie group,in: Proceedings of the 32nd AAAI Conference on Artificial Intelligence,2018, pp. 1819–1826.[11] Y. Jia, Y. Wang, X. Jin, H. Lin, X. Cheng, Knowledge graph embed-ding: A locally and temporally adaptive translation-based approach,ACM Transactions on the Web (2018) 1559–1131.[12] T. Dettmers, P. Minervini, P. Stenetorp, S. Riedel, Convolutional 2Dknowledge graph embeddings, in: Proceedings of the 32nd AAAI Con-ference on Artificial Intelligence, 2018, pp. 1811–1818.[13] S. Zhang, Y. Tay, L. Yao, Q. Liu, Quaternion knowledge graph embed-dings, in: Advances in Neural Information Processing Systems, 2019, pp.2731–2741.[14] M. Nickel, V. Tresp, H. P. Kriegel, A three-way model for collective learn-ing on multi-relational data, in: Proceedings of the 28th InternationalConference on Machine Learning, 2011, pp. 809–816.[15] G. Ji, S. He, L. Xu, K. Liu, J. Zhao, Knowledge graph embedding viadynamic mapping matrix, in: Proceedings of the 53rd Annual Meeting ofthe Association for Computational Linguistics and the 7th InternationalJoint Conference on Natural Language Processing of the Asian Federationof Natural Language Processing, 2015, pp. 687–696.[16] M. Fan, Q. Zhou, E. Chang, T. F. Zheng, Transition-based knowledgegraph embedding with relational mapping properties, in: Proceedings ofthe 28th Pacific Asia Conference on Language, Information and Compu-tation, 2014, pp. 328–337.[17] J. Feng, M. Huang, M. Wang, M. Zhou, Y. Hao, X. Zhu, Knowlege graphembedding by flexible translation, in: Proceedings of the 15th Interna-tional Conference on Principles of Knowledge Representation and Rea-soning, 2016, pp. 557–560.