[PDF] 5* Knowledge Graph Embeddings with Projective Transformations

Abstract

Performing link prediction using knowledge graph embedding models has become a popular approach for knowledge graph completion. Such models employ a transformation function that maps nodes via edges into a vector space in order to measure the likelihood of the links. While mapping the individual nodes, the structure of subgraphs is also transformed. Most of the embedding models designed in Euclidean geometry usually support a single transformation type - often translation or rotation, which is suitable for learning on graphs with small differences in neighboring subgraphs. However, multi-relational knowledge graphs often include multiple sub-graph structures in a neighborhood (e.g. combinations of path and loop structures), which current embedding models do not capture well. To tackle this problem, we propose a novel KGE model (5*E) in projective geometry, which supports multiple simultaneous transformations - specifically inversion, reflection, translation, rotation, and homothety. The model has several favorable theoretical properties and subsumes the existing approaches. It outperforms them on the most widely used link prediction benchmarks

Full PDF

55 (cid:70) Knowledge Graph Embeddingswith Projective Transformations

Mojtaba Nayyeri

SDA ResearchUniversity of Bonn, Germany [email protected]

Sahar Vahdati

Department of Computer ScienceUniversity of Oxford, UK [email protected]

Can Aykul

Department of Computer ScienceUniversity of Bonn, Germany [email protected]

Jens Lehmann

SDA Research, University of BonnFraunhofer IAIS, Bonn, Germany [email protected]

Abstract

Performing link prediction using knowledge graph embedding (KGE) modelsis a popular approach for knowledge graph completion. Such link predictionsare performed by measuring the likelihood of links in the graph via atransformation function that maps nodes via edges into a vector space.Since the complex structure of the real world is reﬂected in multi-relationalknowledge graphs, the transformation functions need to be able to representthis complexity. However, most of the existing transformation functionsin embedding models have been designed in Euclidean geometry and onlycover one or two simple transformations. Therefore, they are prone tounderﬁtting and limited in their ability to embed complex graph structures.The area of projective geometry, however, fully covers inversion, reﬂection,translation, rotation, and homothety transformations. We propose a novelKGE model, which supports those transformations and subsumes other state-of-the-art models. The model has several favorable theoretical properties andoutperforms existing approaches on widely used link prediction benchmarks.

Knowledge graphs (KGs) have been successful in a range of AI tasks including questionanswering, data integration, and recommender systems. The main characteristic of KGs liesin their graph-based knowledge representation structure in the form of (head,relation,tail)triples, where head and tail are entities (nodes) with a relation (edge) between them. Theusage of this graph structure addresses many of the previous challenges of machine learningfor heterogeneous data with complex structure. However, KGs are usually incomplete, whichdirectly aﬀects the performance of learning models on various downstream learning tasks.One of the approaches to deal with the knowledge graph incompleteness problem is topredict the missing links based on the existed ones. This can be done via knowledge graphembeddings (KGEs). Every KGE model uses a transformation function to map entitiesof the graph through relations in a vector space to score the plausibility of triples via ascore function. The performance of KGE models heavily relies on the design of their scorefunction that in turn deﬁnes the type of transformation they support. Such transformationsdistinguish the extent to which a model is able to learn complex motifs and patterns formedby combinations of the nodes and edges in the KG.

Preprint. Under review. a r X i v : . [ c s . A I] J un systematic analysis of already existing KGEs shows that most of them have beendesigned in Euclidean geometry and usually support a single transformation type – of-ten translation or rotation. This limits their ability in embedding complex graph struc-tures. A brief overview of state-of-the-art KGE models and their support for diﬀerenttransformation types is given in Table 1. While all existing models cover at most twotransformation types, projective geometry provides a uniform way for simultaneously rep-resenting ﬁve transformation types namely translation, rotation, homothety, inversion,and reﬂection. The combination of such transformation types results in various trans-formation functions (parabolic, circular, elliptic, hyperbolic, and loxodromic). Follow-ing this, projective transformations subsumes all ﬁve possible transformation functions .Table 1: Supported Transformations of KGEs.Models Tran. Rot. Hom. Inv. Reﬂ.TransE (cid:70) (cid:70) (cid:70) (cid:70) (cid:70) (cid:70) (cid:70) (cid:70) (cid:70) RotatE (cid:70) (cid:70) (cid:70) (cid:70) (cid:70) (cid:70) (cid:70) (cid:70) (cid:70)

ComplEx (cid:70) (cid:70) (cid:70) (cid:70) (cid:70) (cid:70) (cid:70) (cid:70)

QuatE (cid:70) (cid:70) (cid:70) (cid:70) (cid:70) (cid:70) (cid:70) (cid:70) (cid:70) (cid:70) E (cid:70) (cid:70) (cid:70) (cid:70) (cid:70) Our core contribution is a new ﬁve-star em-bedding model, i.e. a model that simulta-neously supports these ﬁve transformationtypes and consequently various-shaped trans-formation functions. Furthermore, we for-mally show that this model, dubbed 5 (cid:70)

E,is (a) fully expressive (as deﬁned in [20]),(b) subsumes the KGE models DistMult, Ro-tatE, pRotatE, TransE, and ComplEx; (c)allows to learn composition, inverse, reﬂexive and symmetric relation patterns. Our evaluationof standard link prediction benchmarks shows that 5 (cid:70)

E outperforms existing models.

A KG is a multi-relational directed graph as KG = ( E , R , T ) where E , R are the set ofnodes (entities) and edges (relations between entities) respectively. The set T = { ( h, r, t ) } ⊆E × R × E contains all triples as (head, relation, tail) , e.g. ( Paris, CapitalOf, France) .In order to apply learning methods on KGs, certain models are employed to transformKGs into a vector space. Knowledge Graph Embeddings (KGEs) are one of the most usedtechniques, which are based on learning vector representations of entities ( E ) and relations( R ) of a KG. Speciﬁcally, a vector representation denoted by ( h , r , t ) is learned by the modelper triple ( h, r, t ), where h , t ∈ V d e , r ∈ V d r , and V is a vector space. TransE [3] considers V = R , in ComplEx [19] and RotatE V = C (complex space) is used and in QuatE [23] V = H (quaternion space). In this paper, we choose a projective space to embed the graphi.e. V = CP (a complex projective line which is introduced later).Most KGE models are deﬁned via a relation-speciﬁc transformation function g r : V d e → V d e which maps head entities to tail entities, i.e. g r ( h ) = t . On top of such a transformationfunction, the score function f : V d e × V d r × V d e → R is deﬁned to measure the plausibilityfor triples: f ( h , r , t ) = p ( g r ( h ) , t ). Generally, the formulation of any score function can beeither p ( g r ( h ) , t ) = −k g r ( h ) − t k or p ( g r ( h ) , t ) = h g r ( h ) , t i . Projective geometry uses homogeneous coordinates which represent N -dimensional coordinateswith N + 1 numbers (i.e. use one additional parameter). For example, a point in 2D Cartesiancoordinates, [ X, Y ] becomes [ x, y, k ] in homogeneous coordinates where X = x/k, Y = y/k .In the case of 1-dimensional real numbers, [ X ] becomes [ x, y ] where X = x/y . The keyelements of projective geometry are as follows:A projective line is a space in which a projective geometry is deﬁned. A projective geometryrequires a point at inﬁnity in order to satisfy the axiom of “two parallel lines intersect ininﬁnity”. Therefore, an extended line P ( K ) (where K is a real line) is realized with K and apoint at inﬁnity (which topologically is a circle). More concretely, the projective line is a set { [ x, ∈ P ( K ) | x ∈ K } with an additional member [1 : 0] denoting the point at inﬁnity. The2 a) circular (b) elliptic (c) hyperbolic (d) loxodromic (e) parabolic Figure 1: Default transformation functions with Riemann Sphere (ﬁrst row) and the Möbiusshape for each transformation after projection on a complex plane (second row) are shown.projective line is real ( RP ) when K = R . In case of K = C , where C is complex space, theset denotes the complex projective line CP .The Riemann Sphere is an extended complex plane with a point at inﬁnity. More precisely,it is built on a plane of complex numbers wrapped around a sphere where poles denote 0 and ∞ . In projective geometry, every complex line is a Riemann sphere. The Riemann sphere isemployed as a tool for projective transformations as shown in Figure 1.A Projective Transformation is the mapping of the Riemann sphere to itself. Let [ x : y ]be the homogeneous coordinates of a point in CP . A projective transformation in CP isexpressed by a matrix multiplication [15, 16] as τ : CP → CP , such that τ ([ x, y ]) = = (cid:20) xy (cid:21) , = = (cid:20) a bc d (cid:21) , (1)where the matrix = must be invertible ( det ( = ) = 0). By identifying CP with ˆ C = C ∪ {∞} , a projective transformation is represented by a fractional expression through a sequence ofhomogenization, transformation, and dehomogenization as x → (cid:20) x (cid:21) → (cid:20) a bc d (cid:21) (cid:20) x (cid:21) → (cid:20) ax + bcx + d (cid:21) → ax + bcx + d (2)where the mapping ϑ : ˆ C → ˆ C is deﬁned as ϑ ( x ) = ax + bcx + d , ad − bc = 0 . (3)The resulting mapping introduced in Equation 3 describes all Möbius transformations.The

Möbius Group is the set of all Möbius transformations which is a projective lineargroup

P GL (2 , C ), i.e., the group of all 2 × Aut ( ˆ C ) as it is theautomorphism group of the Riemann sphere ˆ C or equivalently CP . Every Möbius transformation has at most two ﬁxed points γ , γ on the Riemann sphereobtained by solving ϑ ( γ ) = γ, [15] which gives γ , = ( a − d ) ± √ ∆2 c . (4)Depending on the number of ﬁxed points, Möbius transformations form parabolic or cir-cular (one ﬁxed point), elliptic as well as hyperbolic, and loxodromic (two ﬁxed points)transformation functions (see Figure 1-upper row, and Table 2 for detailed conditions). All3unction Parabolic Circular Elliptic Hyperbolic LoxodromicCondition tr = = 4(∆ = 0) tr = = 0(∆ = 0) 0 < tr = < < tr = > > tr = / ∈ [0 , (cid:20) a (cid:21) (cid:20) i − i (cid:21) (cid:20) e iθ/ e − iθ/ (cid:21) (cid:20) e θ/ e − θ/ (cid:21) (cid:20) k k (cid:21) Table 2: Types of Möbius transformations and their conditions.transformations in each group form a subgroup which is isomorphic to the group of allmatrices mentioned in the row

Iso in Table 2.The illustration in the lower row of Figure 1 gives insights about the way the Möbius transfor-mation induces the ﬁve transformation types (translation, rotation, inversion, reﬂection andhomothety). Given a grid, the transformation is performed by (a) a stereographic projectionfrom Complex plane to Riemann sphere, (b) moving the sphere, (c) stereographic projectionfrom sphere to plane. Each transformation has a characteristic constant k = e α + iβ whichdetermines sparsity/density of the transformation. β is an expansion factor which indicateshow the ﬁxed point γ is repulsive, and the second ﬁxed point γ is attractive. α is a rotationfactor, determining the degree to which a transformation rotates the plane counter-clockwisearound γ and clockwise around γ . KGE models can be classiﬁed according to their embedding space. We will ﬁrst cover KGEsoperating in Euclidean space and then describe related work for other geometric spaces.

Euclidean Knowledge Graph Embedding Models

A large number of KGE modelssuch as TransE [3] and its variants [7, 12, 21] as well as RotatE [18] are designed usingtranslational or rotational (Hadamard product) score functions in Euclidean space. The scoreand loss functions of these models optimize the embedding vectors in a way that maximisethe plausibility of triples, which is measured by the distance between rotated/translated headand tail vectors. Some embedding models such as DisMult [22], ComplEx [19], QuatE [23],and RESCAL [14], including our proposed model, are designed based on element-wisemultiplication of transformed head and tail. In this case, the plausibility of triples is measuredbased on the angle of transformed head and tail. A third category of KGE models are thosedesigned on top of Neural networks (NN) as score function such as ConvE [4] and NTN [17].

Non-Euclidean Knowledge Graph Embedding Models

The aforementioned KGEmodels are limited to Euclidean space, which limits their ability to embed complex structures.Some recent eﬀorts investigated other spaces for embeddings of structures - often simplerstructures than KGs. For example, the hyperbolic space has been extensively studiedin scale-free networks. In recent work, learning continuous hierarchies from unstructuredsimilarity scores using the Lorentz model was investigated [13]. In [1], an embedding modeldubbed MuRP is proposed that embeds multi-relational KGs on a Poincaré ball [8]. MuRPonly focuses on resolving the problem of embedding on KGs with multiple simultaneoushierarchies. Overall, while the advantages of projective geometry are eminent in a widevariety of application domains, including computer vision and robotics, to our knowledge noinvestigation has focused on it within the context of knowledge graph embeddings.

Our method 5 (cid:70)

E inherits the ﬁve main pillars of projective transformation, namely translation,rotation, homothety, inversion and reﬂection. The pipeline for performing the transformationincludes the following steps: (1) element-wise stereographic projection in order to map thehead entity from a complex plane into a point on a Riemann sphere, (2) relation-speciﬁctransformation to move the Riemann sphere into a new position and/or direction; (3) stereographic projection to project the mapped head from the Riemann sphere to a complexplane, (4) selection of complex inner product between the transformed head and the tail.4 .1 Model FormulationEmbedding of Knowledge Graphs on a Complex Projective Line

Let d be theembedding dimension. Given a triple ( h, r, t ), the head and tail entities h, t ∈ E are embeddedinto a d dimensional complex projective line i.e. h , t ∈ CP d . A relation r ∈ R is embeddedinto a d dimensional vector r where each element is a 2 × r contains four complexvectors r a , r b , r c and r d ∈ C d . With r ai , r bi , r ci , r di , h i , t i , we refer to the i th element of r a , r b , r c , r d , h , t respectively. Relation-speciﬁc Transformation

In Section 2.2, we showed that for a projective trans-formation on the complex projective line, there exists an equivalent transformation on theRiemann sphere. We present our model formulation using both perspectives as this allows tounderstand them more comprehensively.

Möbius Representation of Transformation:

We use a relation-speciﬁc Möbius transformationto map the head entity ( h ri ) from a source to a target complex plane ( ˆ C ). The transformationis performed using stereographic projection and transformation ( ϑ ) on/from the Riemannsphere. To do so, we compute h ri to specify the element-wise transformation: h ri = g ri ( h i ) = ϑ ( h i , r i ) = r ai h i + r bi r ci h i + r di , r ai r di − r bi r ci = 0 , i = 1 , . . . , d. (5)This results in the relation-speciﬁc transformed head entity h r = [ h r , . . . , h rd ] . Projective Representation of Transformation:

Using homogeneous coordinates, we can alsorepresent the Möbius transformation from Equation 5 as a projective transformation: h ri = [ g r ( h i ) , T = = ri [ h i , T , i = 1 , . . . , d, (6)where the matrix = ri = (cid:20) r ai r bi r ci r di (cid:21) and the subsequent matrices of = ri are invertiblei.e. det = ri = 0. The matrix representation of Equation 6 is h r = R r [ h : ] , where R r = diag ( = r . . . , = rd ) and is a vector with all the elements being 1. Score Function

The correctness of triples in a KG is the similarity h h r , t i between therelation-speciﬁc transformed head h r and tail t . The model aims to minimize the anglebetween h r and tail t , i.e. their product ( h h r , t i ) is maximized for positive triples. Forsampled negative triples, it is conversely minimized. Overall, the score function for 5 (cid:70) E is f ( h, r, t ) = Re ( h h r , ¯ t i ) , (7)where Re ( x ) is the function that returns the real part of the complex number x . We ﬁrst show that 5 (cid:70)

E is a composition of translation, rotation, homothety, inversion andreﬂection transformations. We then prove that 5 (cid:70)

E is fully expressive and subsumes variouspopular and state-of-the-art KGE models namely TransE, DistMult, ComplEx, RotatE, andpRotatE. Further details, including all proofs, are in the supplementary material.

Möbius – Composition of Five Transformations

The Möbius transformationin Equation 5 is a composition of a series of ﬁve subsequent transformations ϑ , ϑ (two transformations in one) , ϑ and ϑ . as shown in [10]. h ri = ϑ ( h i , r i ) = ϑ ◦ ϑ ◦ ϑ ◦ ϑ ( h i , r i ) , (8)where ϑ ( x , r i ) = x + r di r ci (translation by r di r ci ), ϑ ( x ) = x (inversion and reﬂection w.r.t. realaxis), ϑ ( x , r i ) = r bi r ci − r ai r di r ci x (homothety and rotation) and ϑ ( x , r i ) = x + r ai r ci (translationby r ai r ci ). This shows that 5 (cid:70) E is capable of performing 5 transformations simultaneously.

Subsumption of Other KGE ModelsDeﬁnition 1 (from [20]) . A model M subsumes a model M when any scoring over triplesof a KG measured by model M can also be obtained by model M .5e can formally show that 5 (cid:70) E subsumes various state-of-the-art models:

Proposition 1. (cid:70) E with variants of its score function subsumes DistMult, pRotatE, RotatE,TransE and ComplEx. Speciﬁcally, (cid:70) E subsumes DistMult, ComplEx and pRotatE with itsoriginal score function f ( h, r, t ) = Re ( h h r , ¯ t i ) and subsumes RotatE and TransE with scorefunction f ( h, r, t ) = −k h r − t k (changed inner product to distance). Deﬁnition 2 (from [9]) . A model M is fully expressive if there exist assignments to theembeddings of the entities and relations, that accurately separate correct triples from incorrectones for any given ground truth. Corollary 1.

The (cid:70) E model is fully expressive.

Inference of Patterns

For relations which exhibit patterns in the form of premise → conclusion where premise canbe a conjunction of several triples, a model is said to be able to infer those if the implicationholds for the score function, i.e. if the score of all triples in the premise is positive then thescore for the conclusion must be positive. We investigated the inference ability of 5 (cid:70) E forspeciﬁc patterns including reﬂexive, symmetric, inverse relations and composition.

Proposition 2.

Let r , r , r ∈ R be relations and r (e.g. UncleOf) a composition of r ( e.g. BrotherOf ) and r ( e.g. FatherOf ) . (cid:70) E infers composition with = r = r = = r . Proposition 3.

Let r ∈ R be the inverse of r ∈ R . (cid:70) E infers this pattern with = r = = r − . Proposition 4.

Let r ∈ R be symmetric. (cid:70) E infers the symmetric pattern if = r = = − r . Proposition 5.

Let r ∈ R be a reﬂexive relation. In dimension d , (cid:70) E infers reﬂexivepatterns with O (2 d ) distinct representations of entities if the ﬁxed points are non-identical. TransE only infers composition and inverse patterns, and RotatE is capable of inferring allthe mentioned patterns but it is not fully expressive. ComplEx infers these patterns and isfully expressive. However, it has less ﬂexibility in learning complex structures due to usingonly rotation and homothety.

Discussion on Other Model Properties (cid:70) E inherits various important properties of projective transformation as well as Möbiustransformations. Because the projective linear group

P GL (2 , C ) is isomorphic to the Möbiusgroup, i.e., P GL (2 , C ) ∼ = Aut ( ˆ C ) [10], the properties which are mentioned for Equation 6are also valid for Equation 5. We investigate the inherited properties of 5 (cid:70) E from twoperspectives: capturing local similarities of nodes, and capturing structural groups . Capturing Local Similarities

The similarity of nodes in a KG is local, i.e. nodes of a neigh-borhood are more likely to be semantically more similar [5, 6] than nodes at higher distance.A projective transformation is a bijective conformal mapping, i.e. it preserves angle locallybut not necessarily the length. It also preserves orientation after mapping [10]. Therefore,5 (cid:70)

E is capable of capturing similarity by preserving angle locally via a relation-speciﬁctransformation of nodes.Furthermore, the map π : GL (2 , C ) → Aut ( ˆ C ) ( GL (2 , C ) is a generalized linear group,which transfers the matrix = into a Möbius transformation ϑ is a group homomorphism. Ifdet = = 1 , then π : SL (2 , C ) → Aut ( ˆ C ) becomes limited to only perform a mapping fromthe special linear group SL (2 , C ) to a Möbius group that preserves volume and orientation.In the context of KGs, after a relation-speciﬁc transformation (Equation 6 or equivalentlyEquation 5) of nodes in the head position to nodes in tail position, the relative distanceof nodes can be preserved. From this ability, we expect that 5 (cid:70) E is able to propagate thestructural similarity from one group of nodes to another.

Capturing Structural Groups

When going beyond SL (2 , C ) by changing the determinant todet = 6 = 1, the volume and orientation are changed after transformation. Therefore, 5 (cid:70) E ismore ﬂexible than all of the current KGEs on KGs with various graph structures as those arenot able to change volume and orientation. Additionally, the characteristic of a projectivetransformation in mapping line to circle or circle to line [10] increases the ﬂexibility of themodel. This enables covering various shaped structural transformations (see Section 5). This6trong ﬂexibility is obtained by properly mixing various transformation types mentioned inEquation 8 and Table 1.

Experimental Setup

Following the best practices of evaluations for embedding models,we consider the most-used metrics namely Mean Reciprocal Rank (MRR) and Hits@n. Weevaluated our model on four widely used benchmark datasets namely FB15k, FB15k-237,WN18, and WN18RR. We compare against the best performing models on those benchmarksnamely

TransE [3],

RotatE [18],

TuckEr [2],

ComplEx [19],

QuatE [23],

MuRP [1],

ConvE [4]and

SimplE [9]. We developed our model on top of a standard framework [11] and applied1-N scoring loss with N3 regularization, and added reverse counterparts of each triple tothe train set. All details for the metrics, training datasets and hyperparameters are in thesupplementary material.

Results and Discussion.

The evaluation results are shown in Table 3, which includesresults for 5 (cid:70)

E with embedding dimensions of 100 and 500. Results for other models aretaken from [23] except for TuckER and MuRP which are taken from [2] and [1]. We ﬁrst lookat the WN18 and WN18RR benchmarks. Our model outperforms all state-of-the-art modelsacross all metrics in WN18RR. This is visible in comparisons of the results for example inHits@10 for which 5 (cid:70)

E gets around 0.590 whereas TransE as a translation-based modelperforms 0.501, RotatE as a rotation-based model gets 0.571, and Tucker shows 0.526. InWN18, our model outperforms other models for Hits@3 and Hits@10 while being close tobest for MRR and Hits@1. Here, it should be considered that the only model performingbetter - QuatE - used an embedding dimension of 1000. Generally, we can observe that 5 (cid:70)

Eobtains positive results with a low embedding dimension of 100 (lowest in all settings byothers) on WN18.On the FB15k datasets, we observe that 5 (cid:70)

E outperforms TransE, RotatE, ComplEx SimplEand MuRP on FB15K-237. Our model performs close to TuckEr. QuatE outperforms ourmodel, which may be due to its higher embedding dimension (1000). The same pattern canTable 3: Link prediction results on WN18 and WN18RR as well as FB15k and FB15k-237.

Model WN18 WN18RRMRR Hits@1 Hits@3 Hits@10 MRR Hits@1 Hits@3 Hits@10TransE 0.495 0.113 0.888 0.943 0.226 - - 0.501RotatE 0.949 0.944 0.952 0.959 0.476 0.428 0.492 0.571TuckEr 0.953 0.949 0.955 0.958 0.470 0.443 0.482 0.526ComplEx 0.941 0.936 0.945 0.940 0.440 0.410 0.460 0.510QuatE 0.950 0.944 0.954 0.960 0.482 0.436 0.499 0.572SimplE 0.942 0.939 0.944 0.947 - - - -ConvE 0.943 0.935 0.946 0.956 0.430 0.400 0.440 0.520MuRP - - - - 0.481 0.440 0.495 0.5665 (cid:70) E d = 500 (cid:70) E d = 100 (cid:70) E d = 500 (cid:70) E d = 100

7e seen on FB15K, except for TuckEr, where 5 (cid:70)

E outperforms the model with a considerablemargin on MRR, Hits@1,3. (a)

Original Grid (b) hasPart relation (c) partOf relation (d) hypernym (e) hyponym

Figure 2: Learned 5 (cid:70)

E embeddings for a selected relations in WN18RR.

Learned Transformation Types.

Each relation in the KG is represented as d projectivetransformations in 5 (cid:70) E (one projective transformation per dimension). Figure 2 shows thetransformation types learned by 5 (cid:70)

E in WN18RR relations, in a grid view. The originaland plain view of the grid is given in sub-graph (a) for comparisons of the changes afterthe transformations, and (b) to (e) show speciﬁc relations in WN18. Here we highlight theanalysis of the results on some example relations:

Inversion : In sub-graph (b), the lines (same-color points) in the original grid are mappedto circle or curve (see Section 4.2), after a relation-speciﬁc transformation by the hasPart relation. It is also visible in sub-graph (d) and (e) for hypernym and hyponym relations.

Rotation and Reﬂection : By comparing the direction of the lines with same color (e.g.,red) in the original grid and in all examples of the transformed grids, we conclude that thelearned transformation covers rotation, for example in hypernym and hyponym . We can alsointerpret the results for the hasPart relation as counter-clockwise rotation and then reﬂectionw.r.t. the real axis.

Translation : In sub-graph (b), there is a movement in the real and imaginary axis of the gridtowards down and slightly right for hasPart relation, which represents translation. However,this is not the case for hypernym relation.

Homothety : Semantically, the pairs ( hypernym , hypernym ) and ( hasPart partOf ) forminverse patterns (see Corollary 3). We see that the transformed grid of hypernym and hyponym are diﬀerent only w.r.t. rotation. The scale is not changed, so the determinants of the twoprojective matrices are 1 (no homothety) (see Section 4.2). Comparing hasPart and partOf grids, the scale is changed, so the determinant of those two projection matrices should notbe equal to one. This shows both of those transformations cover homothety . Learned Transformation Functions.

Figure 3 illustrates the results of learned transforma-tion functions for various relations in WN18RR. Sub-ﬁgure (a) and (b) refer to the hyponym relation. However, the depicted shape of transformation function diﬀers for hyperbolic andelliptic transformations. This conﬁrms the ﬂexibility of the model in embedding variousgraph structures as well as diversity in density/sparsity of ﬂow (e.g., hyponym relation). Wealso observed that when two pairs of relations form inverse patterns (in the same dimension),the model mainly learns the same transformation functions but with diﬀerent directions. (a)

Hyperbolic/hyponym (b)

Elliptic/hyponym (c)

Loxodromic/hypernym (d)

Circular/memberOf

Figure 3: Learned 5 (cid:70)

E transformation functions for relations in WN18RR.

In this paper, we introduce a new knowledge graph embedding model which operates on thecomplete set of projective transformations. We build the model on well researched genericmathematical foundations and could indeed show that it subsumes other state-of-the-art8mbedding models. Furthermore, we prove that the model is fully expressive. By supportinga wider range of transformations than previous models, it can embed KGs with more complexstructures, supports a wider range of relational patterns and can suitably handle areas ofthe KG with varying density. Our experimental evaluation on four well established datasetsshows that the model outperforms multiple recent strong baselines.

References [1] I. Balazevic, C. Allen, and T. Hospedales. Multi-relational poincaré graph embeddings.In

Advances in Neural Information Processing Systems , pages 4465–4475, 2019.[2] I. Balažević, C. Allen, and T. M. Hospedales. Tucker: Tensor factorization for knowledgegraph completion. arXiv preprint arXiv:1901.09590 , 2019.[3] A. Bordes, N. Usunier, A. Garcia-Duran, J. Weston, and O. Yakhnenko. Translatingembeddings for modeling multi-relational data. In

Advances in neural informationprocessing systems , pages 2787–2795, 2013.[4] T. Dettmers, P. Minervini, P. Stenetorp, and S. Riedel. Convolutional 2d knowledgegraph embeddings. In

Thirty-Second AAAI Conference on Artiﬁcial Intelligence , 2018.[5] E. Faerman, F. Borutta, K. Fountoulakis, and M. W. Mahoney. Lasagne: Localityand structure aware graph node embedding. In , pages 246–253. IEEE, 2018.[6] W. L. Hamilton, R. Ying, and J. Leskovec. Representation learning on graphs: Methodsand applications. arXiv preprint arXiv:1709.05584 , 2017.[7] G. Ji, S. He, L. Xu, K. Liu, and J. Zhao. Knowledge graph embedding via dynamicmapping matrix. 1:687–696, 2015.[8] G. Ji, K. Liu, S. He, and J. Zhao. Knowledge graph completion with adaptive sparsetransfer matrix. pages 985–991, 2016.[9] S. M. Kazemi and D. Poole. Simple embedding for link prediction in knowledge graphs.In

Advances in neural information processing systems , pages 4284–4295, 2018.[10] V. V. Kisil.

Geometry of Möbius Transformations: Elliptic, Parabolic and HyperbolicActions of SL2 [real Number] . World Scientiﬁc, 2012.[11] T. Lacroix, N. Usunier, and G. Obozinski. Canonical tensor decomposition for knowledgebase completion. In

ICML , 2018.[12] Y. Lin, Z. Liu, M. Sun, Y. Liu, and X. Zhu. Learning entity and relation embeddingsfor knowledge graph completion. 15:2181–2187, 2015.[13] M. Nickel and D. Kiela. Learning continuous hierarchies in the lorentz model ofhyperbolic geometry. arXiv preprint arXiv:1806.03417 , 2018.[14] M. Nickel, V. Tresp, and H.-P. Kriegel. A three-way model for collective learning onmulti-relational data. 11:809–816, 2011.[15] J. Richter-Gebert.

Perspectives on projective geometry: A guided tour through real andcomplex geometry . Springer Science & Business Media, 2011.[16] D. Salomon.

Transformations and projections in computer graphics . Springer Science &Business Media, 2007.[17] R. Socher, D. Chen, C. D. Manning, and A. Ng. Reasoning with neural tensor networksfor knowledge base completion. pages 926–934, 2013.[18] Z. Sun, Z.-H. Deng, J.-Y. Nie, and J. Tang. Rotate: Knowledge graph embedding byrelational rotation in complex space. arXiv preprint arXiv:1902.10197 , 2019.919] T. Trouillon, J. Welbl, S. Riedel, É. Gaussier, and G. Bouchard. Complex embeddingsfor simple link prediction. In

International Conference on Machine Learning , pages2071–2080, 2016.[20] Y. Wang, R. Gemulla, and H. Li. On multi-relational link prediction with bilinearmodels. In

Thirty-Second AAAI Conference on Artiﬁcial Intelligence , 2018.[21] Z. Wang, J. Zhang, J. Feng, and Z. Chen. Knowledge graph and text jointly embedding.pages 1591–1601, 2014.[22] B. Yang, W.-t. Yih, X. He, J. Gao, and L. Deng. Embedding entities and relations forlearning and inference in knowledge bases. arXiv preprint arXiv:1412.6575 , 2014.[23] S. Zhang, Y. Tay, L. Yao, and Q. Liu. Quaternion knowledge graph embedding. arXivpreprint arXiv:1904.10281arXivpreprint arXiv:1904.10281