[PDF] MöbiusE: Knowledge Graph Embedding on Möbius Ring

Abstract

In this work, we propose a novel Knowledge Graph Embedding (KGE) strategy, called M\"{o}biusE, in which the entities and relations are embedded to the surface of a M\"{o}bius ring. The proposition of such a strategy is inspired by the classic TorusE, in which the addition of two arbitrary elements is subject to a modulus operation. In this sense, TorusE naturally guarantees the critical boundedness of embedding vectors in KGE. However, the nonlinear property of addition operation on Torus ring is uniquely derived by the modulus operation, which in some extent restricts the expressiveness of TorusE. As a further generalization of TorusE, M\"{o}biusE also uses modulus operation to preserve the closeness of addition operation on it, but the coordinates on M\"{o}bius ring interacts with each other in the following way: {\em \color{red} any vector on the surface of a M\"{o}bius ring moves along its parametric trace will goes to the right opposite direction after a cycle}. Hence, M\"{o}biusE assumes much more nonlinear representativeness than that of TorusE, and in turn it generates much more precise embedding results. In our experiments, M\"{o}biusE outperforms TorusE and other classic embedding strategies in several key indicators.

Full PDF

MM ¨obiusE: Knowledge Graph Embedding on M ¨obius Ring

Yao Chen a, ∗ , Jiangang Liu a , Zhe Zhang a , Shiping Wen b , Wenjun Xiong a a Department of Computer Science, Southwestern University of Finance and Economics, China b Centre for Artiﬁcial Intelligence, University of Technology Sydney, Sydney, Australia

Abstract

In this work, we propose a novel Knowledge Graph Embedding (KGE) strategy, called M¨obiusE, in which the entities and relationsare embedded to the surface of a M¨obius ring. The proposition of such a strategy is inspired by the classic TorusE, in whichthe addition of two arbitrary elements is subject to a modulus operation. In this sense, TorusE naturally guarantees the criticalboundedness of embedding vectors in KGE. However, the nonlinear property of addition operation on Torus ring is uniquelyderived by the modulus operation, which in some extent restricts the expressiveness of TorusE. As a further generalization ofTorusE, M¨obiusE also uses modulus operation to preserve the closeness of addition operation on it, but the coordinates on M¨obiusring interacts with each other in the following way: any vector on the surface of a M¨obius ring moves along its parametric tracewill goes to the right opposite direction after a cycle . Hence, M¨obiusE assumes much more nonlinear representativeness than thatof TorusE, and in turn it generates much more precise embedding results. In our experiments, M¨obiusE outperforms TorusE andother classic embedding strategies in several key indicators.

Keywords:

M¨obius ring, Torus ring, Knowledge graph, Embedding

1. Introduction

Graph or network structure can usually be used to model theintrinsic information behind data [1]. As an typical example, aknowledge graph (KG) is a collection of facts of the real world,related examples include DBpedia [2], YAGO [3] and Freebase[4]. These built KG database can be applied in many realisticengineering tasks, such as knowledge inference, question an-swering, and sample labeling. Driven by realistic applications,an intriguing and challenging problem for KG is: how to in-fer those unknown knowledge (or missing data) via the existingbuilt KG database.For any KG, the basic element of knowledge is representedby a triplet ( h , r , t ), which composed of two entities h , t and one ∗ Corresponding author

Email address: [email protected] (Yao Chen) relation r . In detail, h is called the head, r is called the tail,and ( h , r , t ) means head h maps to tail t via the operation of re-lation r . For example, the entity h = U.S.A projects to t = DonaldTrump under the relation h = ThePresidentOf ,and the entity h = U.S.A projects to t = MikePence underthe relation h = TheVicePresidentOf . Based on these twotriplets, and considering the fact of “the president and vice pres-ident of a country at the same time are colleagues”, one predictsthe existence of the triplet (

DonaldTrump , IsColleagueOf , MikePence ) if such triplet is not contained in the given database.For realistic applications such as question answering, the coretask is to predict those triplets which are not listed in the givendatabase, such as task is called link prediction in KG.There exists many types of method for link prediction inKG, among which translation-based methods treats each en-

Preprint submitted to Journal of L A TEX Templates January 8, 2021 a r X i v : . [ c s . A I] J a n ity and relation as embedded vectors in some constructed al-gebraic space, and each true triplet can be translated to a simpleconstraint described by an algebraic expression f ( h , r , t ). Us-ing such a method, if some ( h , r , t ) is not presented in KG butthe corresponding algebraic constraint index f ( h , r , t ) is verysmall, then it is reasonable to believe that ( h , r , t ) is a missingfact. Due to the choosing of different algebraic space, manydifferent translation-based models have be obtained. TransE [5]is the ﬁrst translation-based model for link prediction tasks, inwhich it uses the Euclidean space R n as the embedding spaceand the constraint quantity for each triplet is set as simple as (cid:107) h + r − t (cid:107) . TransR [6] builds the constraint equation by map-ping the entities to a subspace derived by the relation vector.TansH [7] improves TransE in dealing with reﬂexive/ONE-TO-MANY/MANY-TO-ONE/MANY-TO-MANY cases in KG, suchan improvement is caused by treating each relation as a trans-lating operation on a hyperplane. DistMult [8] utilizes a sim-ple bilinear formulation to model the constraint generated bytriplets, in this way the composition of relation is characterizedby matrix multiplication, which makes such a model good atcapturing relational semantics. ComplEx [9] uses the standarddot product between embeddings and demonstrates the capabil-ity of complex-valued embeddings instead of real-valued num-bers. TorusE [10] chooses a compact Lie group, a torus, as theembedding space, and is more scalable to large-size knowledgegraphs because of its lower computational complexity. Othertypes of embedding methods include TransA [11], a locally andtemporally adaptive translation-based approach; ConvE [12], amulti-layer convolutional network embedding model; TransF,an embedding strategy with ﬂexibility in each triplet constraint;QuaternionE [13], an embedding method treats each relation asa quaternion in hypercomplex space.Among the above embedding methods, the basic operationbehind TorusE is a simple modulus operation and this opera- tion automatically regularizes the calculated results to the givenbounded area. In this way, there is no need to make an ex-tra regularization operation in TorusE. Inspired by this propertyof TorusE, we move beyond one-dimensional modulus opera-tion on Lie group and propose M¨obiusE, which takes a M¨obiusring (whose basic dimension is two) as the embedding space.This M¨obiusE has the following major beneﬁts for embeddingtasks: At ﬁrst, M¨obiusE provides much more ﬂexibility sinceeach point on the surface of a M¨obius ring has two differentexpressions; Second, the expressiveness of M¨obiusE is muchgreater than TorusE due to the complex distance function builtin M¨obius ring; Third, M¨obiusE also automatically truncatesthe training vectors to constraint space as that of TorusE; Fi-nally, M¨obiusE subsumes TorusE, and inherits all the attrac-tive properties of TorusE, such as its ability to model symme-try/antisymmetry, inversion, and composition.The rest of the paper is organized as follows: Section 2 re-visits the idea of TorusE and proposes the basic idea of M¨obiusE,especially, we discuss about how to deﬁne the distance functionon a M¨obius ring. Section 3 provides the experiment resultsfor M¨obiusE on datasets FB15K and WN18. Section 4 discussabout related works and compare M¨obiusE with other embed-ding methods. Section 5 concludes this paper. In the end, Sec-tion 6 lists the proof for several key propositions on M¨obiusring.

2. Embedding on M¨obius Ring

A KG is described by a set of triples ∆ , where each l = ( h , r , t ) ∈ ∆ contains h , r , and t as head entity, relation, andtail entity, respectively. Denote E = { h , t | ( h , r , t ) ∈ ∆ } and R = { r | ( h , r , t ) ∈ ∆ } as the set of entities and relations of ∆ . Let f be the scoring function, the task of KGE is to ﬁnd vectors h , r , t || h + r − t || h , r , t ∈ R n TransH [7] || h − w T r hw r + d r − ( t − w T r tw r ) || h , t , d r , w r ∈ R n RESCAL [14] h T M r t (cid:107) h (cid:107) ≤ (cid:107) t (cid:107) ≤ (cid:107) M r (cid:107) F ≤ || h + r − t || T n h , r , t ∈ T n Distmult [8] − h T diag ( r ) t h , r , t ∈ R n ComplEx [9] − Re( h T diag ( r ) ¯t ) h , r , t ∈ C n M¨obiusE (ours) dist( h ⊕ r , t ) h , r , t ∈ M q / pn Table 1: Scoring Functions of Typical KGE Models corresponding to h , r , t which minimize: I = (cid:88) l = ( h , r , t ) ∈ ∆ (cid:88) ( h (cid:48) , r , t (cid:48) ) ∈ ∆ (cid:48) l [ γ + f ( h , r , t ) − f ( h (cid:48) , r , t (cid:48) )] + , (1)where H l = { ( e , r , t ) | e ∈ E , e (cid:44) h } , l = ( h , r , t ), (2) T l = { ( h , r , e ) | e ∈ E , e (cid:44) t } , l = ( h , r , t ), (3) ∆ (cid:48) l = H l ∪ T l − ∆ , (4)and l = ( h , r , t ) ∈ ∆ , γ is a margin hyperparameter. The function[ · ] + is deﬁned by [ x ] + = max(0, x ) for x ∈ R .Suppose l = ( h , r , t ) be a positive triple, the trueness of l assessed by f is given byrank l = The rank of f ( l ) in the sequence of { f ( l ) } ∪ f ( ∆ (cid:48) l ) in ascending order. (5)In general, we use the value rank l to evaluate the trueness of acandidate triplet.In order to begin with the following description, we givetwo critical functions m k ( · ) and d k ( · ).For any u ∈ R and k ∈ Z + , m k ( u ) is deﬁned as m k ( u ) ∈ [0, k ) and u ≡ m k ( u ) mod k . (6) m ( u ) is equivalent to [ u ] in [10]. d k ( u ) is deﬁned as d k ( u ) = min( m k ( u ), k − m k ( u )). (7)Based on these deﬁnitions, for any u ∈ R , it is not difﬁcult toverify that m k ( u ) = m k ( u ± k ), m k ( − u ) = k − m k ( u ), d k ( − u ) = d k ( u ), and d k ( u ) = d k ( u ± k ). In particular, d k ( u ) can be refor-mulated as d k ( u ) = min i ∈ Z | u + ik | . T Before introducing T , we would like review the deﬁnitionof Torus ring T . As shown in Fig. 1, T is a one-dimensionalring. In particular, any two points A and B on T can be uniquelydeﬁned by their angles θ = π x and ω = π y with x , y ∈ [0, 1),these variables are all depicted in Fig. 1.The addition operation ⊕ of any two points A and B is de-ﬁned by x ⊕ y = m ( x + y ). The distance between A and B canbe viewed as the minimal value among the rotating angles from A to B and from B to A , i.e., arc AB and arc BA as shown in Fig.1. Describing such an idea in math, we havedist( x , y ) = min { m ( x − y ), m ( y − x ) } = d ( x − y ).This distance function (and all the following distance functions)satisﬁes  dist(x, x) = = dist(y, x) ≥ + y, z) ≤ dist(x, z) + dist(y, z), (8)3 = π x ω = π y arc AB arc BA AB O

Figure 1: Illustration of Torus ring T . where x , y , z ∈ [0, 1).Torus ring T can be viewed as the independent stacking oftwo Torus ring T . Given x = ( x , x ), y = ( y , y ) ∈ [0, 1) × [0, 1) as two points on Torus ring T , then the addition operationon T is deﬁned by x ⊕ y = ( m ( x + y ), m ( x + y )). Thedistance function on T is given asdist(x, y) = (cid:107) (cid:122) (cid:107) τ , (cid:122) = (dist(x , y ), dist(x , y )),where (cid:107) · (cid:107) τ is some vector norm.Torus ring T can be extended to n -dimensional case, let x = ( x i ), y = ( y i ) ∈ [0, 1) × n as two points on T n , it holds x ⊕ y = ( m ( x + y ), · · · , m ( x n + y n )), (9)dist(x, y) = (cid:107) (cid:122) (cid:107) τ , , (10)where (cid:122) = ( d ( x − y ), d ( x − y ), · · · , d ( x n − y n )).Summarize the above discussion, the objective function (1)becomes TorusE when the scoring function f ( · ) is set as f ( h , r , t ) = dist( h ⊕ r , t ), h , r , t ∈ T n ,where dist( · ) is given by (10) and ⊕ is given by (9). M As an intuitive illustration of M¨obius ring, we plot the sur-face of M¨obius ring M in Fig. 3. In this ﬁgure, let the radius from the center of the hole to the center of the tube be R , theradius of the tube be r , then the parametric equation for M¨obiusring is  x ( θ , ω ) = ( R + r cos( θ + ω )) cos θ , y ( θ , ω ) = ( R + r cos( θ + ω )) sin θ , z ( θ , ω ) = r sin( θ + ω ). (11)Letting θ = π x and ω = π x , then any point on M can beuniquely deﬁned by ( x , x ). In particular, the period of x is and the period of x is

1, hence we set ( x , x ) ∈ [0, 2) × [0, 1).Given x = ( x , x ), y = ( y , y ) ∈ [0, 2) × [0, 1) as two pointson a M¨obius ring, the addition operation ⊕ on M is deﬁned by x ⊕ y = ( m ( x + y ), m ( x + y )), (12)i.e., the addition on M is formulated by the modulus additionon each dimension with modulus 2 and 1, respectively.The distance function dist( · ) between x and y is deﬁned asdist( x , y ) = min( w , w ), (13) w = d ( y − x ) + d ( y − x ), (14) w = d ( y − x + + d (cid:16) y − x + (cid:17) . (15)One can further verify that the above deﬁnition follows the ba-sic properties in (8). Moreover, the above distance satisﬁes: Proposition 1.

For distance deﬁned in (13) for M and anytwo points x = ( x , x ) , y = ( y , y ) in [0, 2) × [0, 1) , it holds ≤ dist( x , y ) ≤ , where the upperbound achieves whenx − y = k ± and x − y = k (cid:48) ± for some integer k and k (cid:48) . We give the proof of

Proposition x = ( x , x ), y = ( y , y ) ∈ M and the correspondingpoints A and B on M , transforming A to B (or, B to A ) needsone of the following two operations:4). If transform A to B , then add y − x + k to the ﬁrstcomponent of x , and add y − x + k (cid:48) to the second com-ponent of x for some integer k and k (cid:48) . If transform B to A , then add x − y + k to the ﬁrst component of y , andadd x − y + k (cid:48) to the second component of y for someinteger k and k (cid:48) . Hence, when we measure the distancein the ﬁrst dimension using modulus 2 and consideringthat x , y ∈ [0, 2), such a distance is d ( y − x ). Simi-larly, if we measure the distance between x and y in thesecond dimension using modulus 1 and considering that x , y ∈ [0, 1), the obtained distance is d ( y − x ). Byadding the distance in the two dimensions, we simply ob-tain the distance in this case be d ( y − x ) + d ( y − x ),which conforms with (14).b). If transform A to B , then add y − x + k + x , and add y − x + k (cid:48) + to the secondcomponent of x for some integer k and k (cid:48) . If transform B to A , then add x − y + k + y , and add x − y + k (cid:48) + to the second component of y for some integer k and k (cid:48) . In order to make an intuitivecalculation, we calculate the value of θ + ω and θ for A in (11) as θ + ω = π x + π x , θ = π x ,when add y − x + k + x , and add y − x + k (cid:48) + to x , it becomes θ = π y + (4 k + π , θ + ω = π ( y + k + + π ( y + k (cid:48) +

12 ) = π y + π y + k + k (cid:48) + π ,which coincides with the coordinate of B on M (ignoremultiple of 2 π difference). Similar calculation can be conducted for the case of transforming B to A and hencewe omit the details. To sum it up, when we measure thedistance in the ﬁrst dimension using modulus 2 and con-sidering that x , y ∈ [0, 2), such a distance is d ( y − x + x and y in the second dimension using modulus 1 andconsidering that x , y ∈ [0, 1), the obtained distance is d ( y − x + ). By adding the distance in the two di-mensions, we simply obtain the distance in this case be d ( y − x + + d ( y − x + ), which conforms with(15).Summarize the above discussion, the distance between x , y ∈ M should be the minimum of the two cases, which is equiva-lent to (13).In fact, M¨obius ring M degenerates to Torus ring in somespecial case, as shown in the following proposition, whose proofis given in Appendix 6.3. Proposition 2.

If the ﬁrst dimension of M is set to zero, then M is equivalent to T . For comparison between Torus ring and M¨obius ring, welist the parametric equations for Torus ring T as:  x ( θ , ω ) = ( R + r cos ω ) cos θ , y ( θ , ω ) = ( R + r cos ω ) sin θ , z ( θ , ω ) = r sin ω . (16)As we can see in (16), the period of θ and ω are both 2 π ,hence the following points on T are equivalent:( θ , ω ) ⇔ ( θ + k π , ω + k (cid:48) π ), k , k (cid:48) ∈ Z .However, in M¨obius (11), the period of θ and ω are 4 π and 2 π respectively. In particular, the following points can be viewed5 igure 2: Parametirc curve on Torus: ω = π in (16)Figure 3: Parametirc curve on Mobius: ω = as equivalent points on M¨obius ring:( θ , ω ) ⇔ ( θ + k π , ω + k (cid:48) π ) ⇔ ( θ + (4 k + π , ω + (2 k (cid:48) + π ), k , k (cid:48) ∈ Z ,which can be veriﬁed in (11).As shown in Fig. 2 and Fig. 3, the parametric curve onTorus ring and M¨obius ring are also quite different: ﬁxing ω in(16) generates a cycle, but ﬁxing ω in (11) generates a twistedcycle. M q / p The simple M¨obius ring M can be extended to more gen-eral case M q / p . M¨obius ring M q / p is a space constructed on[0, q ) × [0, p ), where p , q are co-prime positive integers , any x , y ∈ M q / p satisﬁes: x ⊕ y = ( m q ( x + y ), m p ( x + y )), (17)dist(x, y) = pq − min j = (cid:110) d q ( y − x + p j ) + d p ( y − x + q j ) (cid:111) . (18) The reason of why dist(x, y) is deﬁned as (21) is similar to thatof (13) and hence omitted here. The distance dist( · , · ) satisﬁesproperties (8) too, and in particular there is: Proposition 3.

The distance deﬁned in (21) for M q / p satisﬁes dist( x , y ) ∈ (cid:104)

0, 12 p + q (cid:17) , where the upperbound achieves when ( x , y ) = (cid:0) p , q (cid:1) . The proof of the above proposition is similar to that of

Propo-sition + y, z) ≤ dist(x, z) + dist(y, z) in M q / p in (8) is given in Appendix6.1.In order to increase the embedding dimensions for KGE, wedeﬁne M q / pn = M q / p (cid:93) M q / p (cid:93) · · · (cid:93) M q / p (cid:124) (cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32) (cid:123)(cid:122) (cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32) (cid:125) n − (cid:93) (19)as the direct sum of n M¨obius ring M q / p . For each u = ( u , u , · · · , u n ), v = ( v , v , · · · v n ) ∈ M q / pn , there is v i , u i ∈ M q / p for each1 ≤ i ≤ n and u ⊕ v = ( u ⊕ v , · · · , u n ⊕ v n ), (20)dist( u , v ) = (cid:107) (cid:122) (cid:107) τ , (21)where (cid:122) = (dist(u , v ), · · · , dist(u n , v n )) and (cid:107) · (cid:107) τ is some vec-tor norm. For embedding on M¨obius ring, the scoring function f ( h , r , t ) in (1) will be set asf ( h , r , t ) = dist( h ⊕ r , t ), h , r , t ∈ M q / pn , where the operation ⊕ is deﬁned in (20) and distance functionis deﬁned in (21) , which is listed in Tab. 1.In what follows, we list a key proposition for M¨obius ring M q / p , whose proof is intuitive and hence omitted here.6 roposition 4. The solutions of the following equation dist( x , 0) = in the region [0, q ) × [0, p ) for M q / p are x = ( iq / p , jp / q ) with ≤ i < p and ≤ j < q. In the next, we call the solutions of equation (22) be zeropoints of M q / p .For any given KG described by set of triplets ∆ , one of the difﬁculties for KGE comes from the cycle structure in the givenKG data , a cycle structure in KG means the existence of entities h i and relations r i (0 ≤ i ≤ m ) such that( h i , r i , h i + ) ∈ ∆ or ( h i + , r i , h i ) ∈ ∆ for 0 ≤ i ≤ m − h m , r m , h ) ∈ ∆ or ( h , r m , h m ) ∈ ∆ . In the case that thegiven set ∆ contains no cycle structure, the geometric structureof ∆ becomes a tree, and the embedding task for this KG with-out considering the negative samples can be implemented via asimple iteration strategy. However, realistic KG data contains agreat amount of logic cycles and an appropriate KGE strategyshould choose an algebraic structure which is capable of ﬁttingthe constraint generated by tremendous number of cycles. Consider the properties in (8), the constraint of the abovecycle without entities can be described by an equation of rela-tions on M q / p :dist( γ ˆ r + γ ˆ r + · · · γ s ˆ r s , 0) =

0, (23)where γ i ∈ Z , ˆ r i ∈ { r , r , · · · , r m } .In general, the number of cycles in a given KG dataset ismuch larger than that of the relations, hence the set of equa-tions (23) in Euclidean space is hardly to have an exact solu-tion. Intuitively, since the number of zero points in M q / p in theregion [0, q ) × [0, p ) is pq, the set of equations (23) derivedby all the cycles in given KG data will have more chance to get an exact solution. However, the number of zero point in T = [0, 1) × [0, 1) is only 1 for TorusE, which is much smallerthan that of M q / p . In the above sense, M¨obiusE assumes morepowerful expressiveness than TorusE.

3. Experiments

The performance of the proposed M¨obiusE is tested on twodatasets: FB15K and WN18 [5], which are extracted from re-alistic knowledge graphs. The basic properties on these twodatasets is given in Tab. 2.We conduct the link prediction task by using the same methodreported in [5]. For each test triplet (which is a positive sam-ple), we replace its head (or tail) to generate a corrupted triplet,then the score of each corrupted triplet is calculated by the scor-ing function f ( · ), and the ranking of the test triplet is obtainedaccording to these scores, i.e., the deﬁnition of rank l in (5). Itshould be noted from (4) that the set of generated corrupted el-ements excludes the positive triplets ∆ , we call such a set be“ﬁltered” [10], and the ranking values in our experiments areall obtained on the “ﬁltered” set.We choose stochastic gradient descent algorithm to opti-mize the objective (1). In order to generate the negative samplein (1), we employ the “Bern” method [7] for negative samplingbecause of the datasets only contains positive triplets.To evaluate our model, we use Mean Rank (MR), MeanReciprocal Rank (MRR) and HIT@ m as evaluating indicators[5]. MR is calculated by mean l ∈ ∆ (rank l ) with ∆ be the set oftriplet data and rank l is deﬁned in (5), MRR is calculated bymean l ∈ ∆ (1 / rank l ), HIT@ m is deﬁned by |{ l ∈ ∆ : rank l ≤ m }| / | ∆ | .We conduct a grid search to ﬁnd a set of optimal hyper-parameters for each dataset, the searching area for margin γ in (1) is { } and the searching area forlearning rate α in stochastic gradient descent is { ataset M¨obiusE(3,2) 0.767 73.87 } . Finally, the learning rate α is set to0.0005 both for WN18 and FB15K, the margin γ is set to 2, 000for WN18 and 500 for FB15K.We conduct our experiments on the augmented M¨obius ring M q / pn as deﬁned in (19), and we choose three different types ofM¨obisu ring for embedding, i.e., M / n , M / n , M / n . In choos-ing the distance function in M q / pn , we choose (cid:107) · (cid:107) τ = (cid:107) · (cid:107) indist( · ) in (21). We call the above corresponding M¨obiusE beM¨obiusE(2,1), M¨obiusE(3,1), M¨obiusE(3,2) in Tab. 3 and Tab.4. The value of n in M q / pn is set to 5, 000, which means in eachembedding vector we have 10, 000 parameters to be trained.The dimensions for other types of models in Tab. 3 and Tab.4 are all set to 10, 000.As shown in Tab. 3, M¨obius(3,1) outperforms the othermodels in 4 of the 5 critical indicators. In Tab. 4, M¨obius(2,1)outperforms the other models also in 4 of the 5 critical indica-tors.

4. Related Works

The key for different types of KGE strategies is the choos-ing of embedding spaces, and in turn different addition opera-tions and different scoring functions, i.e., different f ( · ) in (1). Model MRR MR HIT@10 HIT@3 HIT@1TransE 0.414 - 0.688 0.534 0.247TransR 0.218 - 0.582 0.404 0.218RESCAL 0.890 - 0.928 0.904 0.842DistMult 0.797 655 0.946 - -ComplEx 0.941 - 0.947 0.945 0.936TorusE

M¨obiusE(3,1)

As the ﬁrst work in KGE, TranE [5] uses the basic algebraicaddition to generate the score function, and regularizes the ob-tained vectors via a normalized condition. In TransE, the left(right) entity and the relation uniquely deﬁnes the right (left) en-tity. However, for realistic KG, it may have the following prop-erties: a). ONE-TO-MANY mapping, given any ﬁxed relation r , the left (right) entity corresponding to a the right (left) entityis not unique; b): MANY-TO-ONE mapping, given any ﬁxedentities h and t , the relation between h and t may not be unique.The original TransE cannot resolve the above two drawbacks inthe beginning.As an improvement of TransE to overcome the above draw-back a), TransR [6] maps each relation r to a vector and a ma-trix, the corresponding scoring function is derived by the imageof such a relation matrix. In this sense, different entities maybe mapped to a same vector in the image space and the draw-back a) of TransE is resolved. Similar to TransR, TransH [7]uses a single vector to obtain the image of each entities. As fur-ther generalization of TransR and TransH, TransD [15] uses adynamic matrix which is determined both by relation and enti-ties to generate the above mapping matrix. As another solutionto the drawbacks of TransE, TransM [16] introduces a relation-based adjust factor to the scoring function in TransE, such afactor will be much smaller in MANY-TO-MANY case thanthat of ONE-TO-ONE case and hence penalizing effects works.Introducing ﬂexibility in the scoring function in some extent8nhances the generalization capability of KGE. As an imple-mentation of such an idea, TransF [17] uses a ﬂexible scoringfunction in KGE, in which the matching degree is desribed bythe inner products of desired entities and given relation. In-creasing the complexity in the scoring function enhances thenonlinear representative capability of KGE, RESCAL [14] andDistMult [8] uses a matrix-induced inner product to representthe scoring function in KGE, and ComplEx [9] applies the sim-ilar but embeds both relation and entities to a complex-valuedspace. TransA [11] combines the idea of TransE and RESCALand incorporates the residue vector in TransE to the vector-based norm in RESCAL, such a strategy can adaptively ﬁnd theloss function according to the structure of knowledge graphs.M¨obiusE is inspired from TorusE [10], however, M¨obiusEdiffers from TorusE in the following aspects: The minimal di-mension of Torus ring is 1, but the minimal dimension of M¨obiusring is 2; The addition in Torus ring chooses an unique valuefor modulus operation (generally 1), but the addition in M¨obiusring chooses different values (generally p and q in M q / p ); Thedistance function is M¨obius ring is strongly nonlinear (see, (21)),which is much more complicated than that of Torus ring. Thesemajor difference guarantees the strong expressiveness of M¨obiusE.

5. Conclusions

A novel KGE strategy is proposed by taking advantage ofthe intertwined rotating property of M¨obius ring. As a ﬁrststep, we deﬁned the basic addition operation on M¨obius ringand discussed its related properties. Next, in order to obtainan appropriate scoring function based on M¨obius ring, we builta distance function on M¨obius ring and constructed the corre-sponding distance-induced scoring function. Finally, a com-plete KGE strategy is obtained via the above constructions, whichoutperforms several key KGE strategies in our conducted exper-iments.

6. Appendix dist(x, z) ≤ dist(x, y) + dist(y, z) in M q / p . According to the deﬁnition of dist( · , · ) on M q / p , we onlyneed to prove dist(x + y, 0) ≤ dist(x, 0) + dist(y, 0).Note that for any i , i , i (cid:48) , i (cid:48) , j , j (cid:48) ∈ Z , there is d q ( y + x + p j ) + d p ( y + x + q j ) ≤ | y + x + p j + i q | + | y + x + q j + i p | + ≤ | x + p j + p j (cid:48) + i q + i (cid:48) q | + | x + q j + q j (cid:48) + i p + i (cid:48) p | + | y − p j (cid:48) − i (cid:48) q | + | y − q j (cid:48) − i (cid:48) p | based on which and the arbitrariness of i and i , there isdist( x + y , 0) = pq − min j = (cid:40) d q ( y + x + p j ) + d p ( y + x + q j ) (cid:41) ≤ pq − min j = (cid:32) min i | x + p j + p j (cid:48) + i q + i (cid:48) q | + min i | x + q j + q j (cid:48) + i p + i (cid:48) p | (cid:33) + | y − p j (cid:48) − i (cid:48) q | + | y − q j (cid:48) − i (cid:48) p | = pq − min j = (cid:32) d q ( x + p ( j + j (cid:48) )) + d p ( x + q ( j + j (cid:48) )) (cid:33) + | y − p j (cid:48) − i (cid:48) q | + | y − q j (cid:48) − i (cid:48) p | . = dist( x , 0) + | y − p j (cid:48) − i (cid:48) q | + | y − q j (cid:48) − i (cid:48) p | .Due to the arbitrariness of j (cid:48) , i (cid:48) , i (cid:48) , there is dist( x + y , 0) ≤ dist( x , 0) + dist( y , 0). We deﬁne a function g ( α , β ) = min { g , g } with g = d ( α ) + d ( β ), g = d ( α + + d ( β + ). Based on which we know α has a period of 2 and β has a period of 1. Next, we divide the in-terval [ −

1, 0) to 4 subintervals I α , i = [ − + i , − + i ), anddivide [ − I β , j = [ − + j , − + j ) with i , j =

0, 1, 2, 3, then the value of sup x , y ∈ I α i × I β , j g canbe calculated and all these 16 values are 3 / Let x = (0, x ), y = (0, y ), then dist( x , y ) = min( d ( y − x ), 1 + d ( y − x + )) = d ( y − x ), hence M can be viewedas T by constraining the ﬁrst dimension to zero. References [1] J. Lu, J. Xuan, G. Zhang, X. Luo, Structural property-aware multilayernetwork embedding for latent factor analysis, Pattern Recognition 76(2018) 228 – 241.[2] S. Auer, C. Bizer, G. Kobilarov, J. Lehmann, R. Cyganiak, Z. Ives, Db-pedia: a nucleus for a web of open data, in: Proceedings of the 6th In-ternational Semantic Web Conference and the 2nd Asian Semantic WebConference, 2007, pp. 722–735.[3] M. F. Suchanek, G. Kasneci, G. Weikum, Yago: a core of semantic knowl-edge, in: Proceedings of the 16th International Conference on WorldWide Web, 2007, pp. 697–706.[4] K. Bollacker, C. Evans, P. Paritosh, T. Sturge, J. Taylor, Freebase: acollaboratively created graph database for structuring human knowledge,in: Proceedings of the 2008 ACM SIGMOD International Conference onManagement of Data, 2008, pp. 1247–1250.[5] A. Bordes, N. Usunier, A. Garc´ıa-Dur´an, J. Weston, O. Yakhnenko,Translating embeddings for modeling multi-relational data, in: Advancesin Neural Information Processing Systems, 2013, pp. 2787–2795.[6] Y. Lin, Z. Liu, M. Sun, Y. Liu, X. Zhu, Learning entity and relation em-beddings for knowledge graph completion, in: Proceedings of the 29thAAAI Conference on Artiﬁcial Intelligence, 2015, pp. 2181–2187.[7] Z. Wang, J. Zhang, J. Feng, Z. Chen, Knowledge graph embedding bytranslating on hyperplanes, in: Proceedings of the 28th AAAI Conferenceon Artiﬁcial Intelligence, 2014, pp. 1112–1119.[8] B. Yang, W. T. Yih, X. He, J. Gao, L. Deng, Embedding entities andrelations for learning and inference in knowledge bases, in: Proceedingsof the 4th International Conference of Learning Representations, 2015,pp. 1412–1423.[9] T. Trouillon, J. Welbl, S. Riedel, E. Gaussier, G. Bouchard, Complexembeddings for simple link prediction, in: Proceedings of the 29th Inter-national Conference on Machine Learning, 2012, pp. 2071–2080. [10] T. Ebisu, R. Ichise, Toruse: Knowledge graph embedding on a lie group,in: Proceedings of the 32nd AAAI Conference on Artiﬁcial Intelligence,2018, pp. 1819–1826.[11] Y. Jia, Y. Wang, X. Jin, H. Lin, X. Cheng, Knowledge graph embed-ding: A locally and temporally adaptive translation-based approach,ACM Transactions on the Web (2018) 1559–1131.[12] T. Dettmers, P. Minervini, P. Stenetorp, S. Riedel, Convolutional 2Dknowledge graph embeddings, in: Proceedings of the 32nd AAAI Con-ference on Artiﬁcial Intelligence, 2018, pp. 1811–1818.[13] S. Zhang, Y. Tay, L. Yao, Q. Liu, Quaternion knowledge graph embed-dings, in: Advances in Neural Information Processing Systems, 2019, pp.2731–2741.[14] M. Nickel, V. Tresp, H. P. Kriegel, A three-way model for collective learn-ing on multi-relational data, in: Proceedings of the 28th InternationalConference on Machine Learning, 2011, pp. 809–816.[15] G. Ji, S. He, L. Xu, K. Liu, J. Zhao, Knowledge graph embedding viadynamic mapping matrix, in: Proceedings of the 53rd Annual Meeting ofthe Association for Computational Linguistics and the 7th InternationalJoint Conference on Natural Language Processing of the Asian Federationof Natural Language Processing, 2015, pp. 687–696.[16] M. Fan, Q. Zhou, E. Chang, T. F. Zheng, Transition-based knowledgegraph embedding with relational mapping properties, in: Proceedings ofthe 28th Paciﬁc Asia Conference on Language, Information and Compu-tation, 2014, pp. 328–337.[17] J. Feng, M. Huang, M. Wang, M. Zhou, Y. Hao, X. Zhu, Knowlege graphembedding by ﬂexible translation, in: Proceedings of the 15th Interna-tional Conference on Principles of Knowledge Representation and Rea-soning, 2016, pp. 557–560.[1] J. Lu, J. Xuan, G. Zhang, X. Luo, Structural property-aware multilayernetwork embedding for latent factor analysis, Pattern Recognition 76(2018) 228 – 241.[2] S. Auer, C. Bizer, G. Kobilarov, J. Lehmann, R. Cyganiak, Z. Ives, Db-pedia: a nucleus for a web of open data, in: Proceedings of the 6th In-ternational Semantic Web Conference and the 2nd Asian Semantic WebConference, 2007, pp. 722–735.[3] M. F. Suchanek, G. Kasneci, G. Weikum, Yago: a core of semantic knowl-edge, in: Proceedings of the 16th International Conference on WorldWide Web, 2007, pp. 697–706.[4] K. Bollacker, C. Evans, P. Paritosh, T. Sturge, J. Taylor, Freebase: acollaboratively created graph database for structuring human knowledge,in: Proceedings of the 2008 ACM SIGMOD International Conference onManagement of Data, 2008, pp. 1247–1250.[5] A. Bordes, N. Usunier, A. Garc´ıa-Dur´an, J. Weston, O. Yakhnenko,Translating embeddings for modeling multi-relational data, in: Advancesin Neural Information Processing Systems, 2013, pp. 2787–2795.[6] Y. Lin, Z. Liu, M. Sun, Y. Liu, X. Zhu, Learning entity and relation em-beddings for knowledge graph completion, in: Proceedings of the 29thAAAI Conference on Artiﬁcial Intelligence, 2015, pp. 2181–2187.[7] Z. Wang, J. Zhang, J. Feng, Z. Chen, Knowledge graph embedding bytranslating on hyperplanes, in: Proceedings of the 28th AAAI Conferenceon Artiﬁcial Intelligence, 2014, pp. 1112–1119.[8] B. Yang, W. T. Yih, X. He, J. Gao, L. Deng, Embedding entities andrelations for learning and inference in knowledge bases, in: Proceedingsof the 4th International Conference of Learning Representations, 2015,pp. 1412–1423.[9] T. Trouillon, J. Welbl, S. Riedel, E. Gaussier, G. Bouchard, Complexembeddings for simple link prediction, in: Proceedings of the 29th Inter-national Conference on Machine Learning, 2012, pp. 2071–2080. [10] T. Ebisu, R. Ichise, Toruse: Knowledge graph embedding on a lie group,in: Proceedings of the 32nd AAAI Conference on Artiﬁcial Intelligence,2018, pp. 1819–1826.[11] Y. Jia, Y. Wang, X. Jin, H. Lin, X. Cheng, Knowledge graph embed-ding: A locally and temporally adaptive translation-based approach,ACM Transactions on the Web (2018) 1559–1131.[12] T. Dettmers, P. Minervini, P. Stenetorp, S. Riedel, Convolutional 2Dknowledge graph embeddings, in: Proceedings of the 32nd AAAI Con-ference on Artiﬁcial Intelligence, 2018, pp. 1811–1818.[13] S. Zhang, Y. Tay, L. Yao, Q. Liu, Quaternion knowledge graph embed-dings, in: Advances in Neural Information Processing Systems, 2019, pp.2731–2741.[14] M. Nickel, V. Tresp, H. P. Kriegel, A three-way model for collective learn-ing on multi-relational data, in: Proceedings of the 28th InternationalConference on Machine Learning, 2011, pp. 809–816.[15] G. Ji, S. He, L. Xu, K. Liu, J. Zhao, Knowledge graph embedding viadynamic mapping matrix, in: Proceedings of the 53rd Annual Meeting ofthe Association for Computational Linguistics and the 7th InternationalJoint Conference on Natural Language Processing of the Asian Federationof Natural Language Processing, 2015, pp. 687–696.[16] M. Fan, Q. Zhou, E. Chang, T. F. Zheng, Transition-based knowledgegraph embedding with relational mapping properties, in: Proceedings ofthe 28th Paciﬁc Asia Conference on Language, Information and Compu-tation, 2014, pp. 328–337.[17] J. Feng, M. Huang, M. Wang, M. Zhou, Y. Hao, X. Zhu, Knowlege graphembedding by ﬂexible translation, in: Proceedings of the 15th Interna-tional Conference on Principles of Knowledge Representation and Rea-soning, 2016, pp. 557–560.