Approximate Knowledge Graph Query Answering: From Ranking to Binary Classification
Ruud van Bakel, Teodor Aleksiev, Daniel Daza, Dimitrios Alivanistos, Michael Cochez
AApproximate Knowledge Graph QueryAnswering:From Ranking to Binary Classification
Ruud van Bakel , − − − , Teodor Aleksiev , − − − ,Daniel Daza , , − − − , DimitriosAlivanistos , − − − , and Michael Cochez , − − − Computer Science, Vrije Universiteit Amsterdam, The Netherlands { d.dazacruz, d.alivanistos, m.cochez } @vu.nl University of Amsterdam, The Netherlands [email protected] Leiden University, The Netherlands [email protected] Discovery Lab, Elsevier, The Netherlands https://discoverylab.ai
Abstract.
Large, heterogeneous datasets are characterized by missingor even erroneous information. This is more evident when they are theproduct of community effort or automatic fact extraction methods fromexternal sources, such as text. A special case of the aforementioned phe-nomenon can be seen in knowledge graphs, where this mostly appears inthe form of missing or incorrect edges and nodes.Structured querying on such incomplete graphs will result in incompletesets of answers, even if the correct entities exist in the graph, since one ormore edges needed to match the pattern are missing. To overcome thisproblem, several algorithms for approximate structured query answeringhave been proposed. Inspired by modern Information Retrieval metrics,these algorithms produce a ranking of all entities in the graph, and theirperformance is further evaluated based on how high in this ranking thecorrect answers appear.In this work we take a critical look at this way of evaluation. We ar-gue that performing a ranking-based evaluation is not sufficient to assessmethods for complex query answering. To solve this, we introduce Mes-sage Passing Query Boxes (MPQB), which takes binary classificationmetrics back into use and shows the effect this has on the recently pro-posed query embedding method MPQE.
Keywords:
Query answering · geometric representation · Box embed-dings · approximation. In many organizations, a vast amount of complex information is used in opera-tions daily. This data is often stored in various databases or file systems whileinformation can be retrieved using query languages and information retrievaltechniques. During the past decade, several companies have started taking up a r X i v : . [ c s . D B ] F e b van Bakel, Aleksiev, et al. knowledge graphs (KG) [10], as a way to represent heterogeneous data and makeit useful for a large variety of applications [14]. To make said data accessible,various querying languages like SPARQL and Cypher have been developed. Suchquerying languages allow for accessing nodes in the graph, traversing them viaspecific relations, or retrieve nodes that match a specific pattern. At the core ofthese languages lie graph patterns. These patterns can be thought of as graphshaped structures where some nodes and edges can correspond to nodes existingin the graph, while others correspond to variables (with specific variable names).When a match for this pattern is found in the graph, the variables are boundand the appropriate values are returned as the result.However, the performance of the previously described process is heavily de-pendent on the level of completeness in the graph.To go in detail, completeness refers to whether it contains all the nodes andedges in the graph pattern, and has a binding for all variables. Having a singlenode or edge missing from the graph, which represents a comparatively small bitof information, results in missing answers. This phenomenon could be good, incase of an erroneous piece of information, or bad, in case of information missingfrom the graph.In this paper, we focus on this issue, specifically the case of missing edges inthe graph. Ideally, we would like a query system that can still give answers whenthe phenomenon described before applies. We would like to have approximatequery answering .One way to approach this, is by performing link prediction. In link predic-tion, one would try to predict missing links in the graph, by training a machinelearning model on the known parts of it. While not trivial, it is possible to usethe single link prediction mechanism to answer queries with missing links. An-other way to approach this problem is by using the so-called query encoders.These encoders take a query as input and produce an embedding (a high dimen-sional vector representation) for it. This query embedding is later compared tolearned embeddings for the entities in the graph. This machine learning systemis optimised in such a way that entities close to the query embedding in vectorspace, are also its probable answers.In this paper we focus on the analysis and evaluation of these systems. Typ-ically, such systems return a series of candidate answers to the query, accompa-nied by a likelihood or distance from the query embedding in vector space. In theevaluation phase, this ranking is compared to, not a ground truth ranking, butrather the set of correct answers to the query. To do this, typical measures likehits@n (how many correct answers out of n) and mean reciprocal rank (MRR –what is the average reciprocal of the rank of correct answers) are used. Whilethese measures are appropriate for information retrieval systems, they fall shortwhen it comes to query systems. In the latter, the results are not ranked, butare rather the correct answer or not.This is also reflected in how these measures are usually adapted by modifyingthem to filtered versions. In this case, measures like hits@n and MRR are com- G Query Answering with Binary Classification 3 puted such that true answers higher in the returned ranking are ignored whencomputing for example the rank for lower ranked entities.We argue that we need to look into metrics that are not based on specificranking of the results, but rather on a crisp set of results retrieved from thesesystems. A main argument for why this is necessary is that many downstreamtasks using the aforementioned results need to get a finite set of answers fromthe knowledge graph, not just a ranked list of all possible entities. That is, weneed a query engine that does not just act as a ranking system, but as a binaryclassifier: it must provide a set of entities that are answers to the query whileall other entities are not. In this scenario, the evaluation would be the same aswhat has traditionally been used for classification problems, with measures suchas precision and recall.This paper is structured as follows: in section 2, we provide an example forseveral algorithms used for approximate query answering. Then, in section 3 wediscuss how metrics for binary classification can provide additional insight ontop of the metrics used for ranking. We end that section with a general directionon how this could be achieved in the existing systems using volumetric queryembeddings. Section 4 details a first approach for solving this problem usingaxis-aligned hyper-rectangles for these queries. We describe the MPQB model, aproof-of-concept, in the section after that. Finally, we provide a conclusion andfuture outlook.This work is largely based on the Bachelor thesis works of Ruud van Bakel [3]and Teodor Aleksiev [1], who both worked under the supervision of MichaelCochez at the Vrije Universiteit Amsterdam.
We define a knowledge graph as a tuple G = ( V , R , E ), where V is a set of entities, R a set of relation types, and E a set of binary predicates of the form r ( h, t )where r ∈ R and h, t ∈ V . Each binary predicate represents an edge of type r between the entities h and t , and thus we call E the set of edges in the knowledgegraph.A query on a KG looks for the set of entities that meet a particular condition,specified in terms of binary predicates whose arguments can be constants (i.e.entities in V ), or variables. As an example, consider the following query (adaptedfrom [4]): “Select all projects P , such that topic T is related to P , and both Alice and
Bob work on T ”. In this query, the constants entities are Alice and
Bob ,and the variables are denoted as P and T . We can define such a query formallyin terms of a conjunction of binary predicates, as follows: q = P. ∃ T, P : related(
T, P ) ∧ works on(Alice , T ) ∧ works on(Bob , T ) . (1)More formally, we are interested in answering conjunctive queries , that havethe following general form: q = V t . ∃ V , . . . , V m : r ( a , b ) ∧ . . . ∧ r m ( a m , b m ) , (2) van Bakel, Aleksiev, et al. In this notation, r i ∈ R , and a i and b i are constant entities in the KG, orvariables from the set { V t , V , . . . , V m } .Recent works have proposed to use machine learning methods to answer suchqueries. These methods operate by learning a vector representation in a space R d for each entity and relation type. These representations are also known as embeddings , and we denote them as e v for v ∈ V and e r for r ∈ R . Similarly,these methods define a query embedding function φ (usually defined with somefree parameters), that maps a query q to an embedding φ ( q ) = q ∈ R d .Given a query embedding q , a score for every entity in the graph can beobtained via cosine similarity:score( q , e v ) = q (cid:62) e v (cid:107) q (cid:107)(cid:107) e v (cid:107) . The entity and relation type embeddings, as well as any free parameters in theembedding function φ , are optimized via stochastic gradient descent on a specificloss function. Usually the loss is defined so that for a given embedding of a query,the cosine similarity is maximized with embeddings of entities that answer thequery, and minimized for embeddings of entities sampled at random.The dataset used for training consists of query-answer pairs mined from thegraph. Once the procedure terminates, the function φ can be used to embed aquery. The entities in the graph can then be ranked as potential answers, bycomputing the cosine similarity of all the entity embeddings and the embeddingof the query.Note that in contrast with classical approaches to query answering, such asthe use of SPARQL in a graph database, this approach can return answers evenif no entity in the graph matches exactly every condition in the query.In the next sections we review the specifics of recently proposed methods,which consider particular geometries for embedding entities, relation types, andqueries; as well as scoring functions. Conjunctive queries can be represented as a directed acyclic graph, where theleaf nodes are constant entities, any intermediate nodes are variables, and theroot node is the target variable of the query. In this graph, the edges have labelsthat correspond to the relation type involved in a predicate.We illustrate this in Fig. 1 for the example query introduced previously. InGraph Query Embedding (GQE) [9], the authors note that this graph can beemployed to define a computation graph that starts with the embeddings of theentities at the leaves, and follows the structure of the query graph until thetarget node is reached.GQE was one of the first models that defined a query embedding functionto answer queries over KGs. The function relies on two different mechanisms,each of which handles paths and intersections, respectively. This requires gener-ating a large dataset of queries with diverse shapes that incorporate paths andintersections.
G Query Answering with Binary Classification 5
AliceBob
Fig. 1.
The query q = P. ∃ T, P : related(
T, P ) ∧ works on(Alice , T ) ∧ works on(Bob , T )can be represented as a directed acyclic graph, where the leaves are constant entities,the intermediate node T is a variable, and P is the target entity. (adapted from a figurein [4]) Graph Convolutional Networks (GCNs) [5,11,8] are an extension of neural net-works to graph-structured data, that allow defining flexible operators for a va-riety of machine learning tasks on graphs. Relational Graph Convolutional Net-works (R-GCNs) [17] are a special case that introduces a mechanism to dealwith different relation types as they occur in KGs, and have been shown to beeffective for tasks like link prediction and entity classification.In MPQE [4], the authors note that a more general query embedding functioncan be defined in comparison with GQE, if an R-GCN is employed to map thequery graph to an embedding. The generality stems from the fact that the R-GCN uses a general message-passing mechanism to embed the query, instead ofrelying on specific operators for paths and intersections.
Both GQE and MPQE embed a query as a single vector (i.e., a point in space).Query2Box [15] deviates from this idea and uses a box shape to represent aquery. The method further narrows the allowed embedding shape to axis-alignedhyper-rectangles. We will discuss more in section 4 why that is beneficial. Thismethod has several benefits, especially for conjunctive queries; for these queries,the answer set can be seen as the intersection of the answers to the conjuncts.Such an operation can be imagined with an embedded volume, but not with avector embedding.While this method would have made it possible to create a binary classifier,the model is not specifically trained, nor evaluated for multiple answers.
Complex Query Decomposition (CQD) [2], is a recently proposed method forquery answering based on using simple methods for 1-hop link prediction toanswer more complex queries. In CQD, the link predictors used are DistMult van Bakel, Aleksiev, et al. [21] and ComplEx [20]. Such link predictors are more data efficient than theprevious methods, since they only need to be trained with the set of observedtriples. In contrast, to be effective the previous methods require mining millionsof queries covering a wide range of structures.In CQD, a complex query is decomposed in terms of its binary predicates.The link predictor is used to compute scores for each of them, and the scoresare then aggregated with t-norms, which have been employed in the literature ascontinuous relaxations of the conjunction and disjunction operators [18,13,12].CQD provides an answer to the query by providing a ranking of entitiesbased on the maximization of the aggregated scores. Therefore, the evaluationprocedure for CQD is the same as the previous methods.
As discussed above, there are merits to returning a hard answer set as opposedto returning a ranking. One way to obtain such binary classifications is to definea threshold within a ranking. As we will further describe in section 4, one cancreate such a threshold by using shapes (e.g. axis aligned hyper-rectangles) forquery embeddings.
Binary classification does introduce new challenges. One such challenge can beseen in the definition of a loss function that can act differently for entities withinthe set and entities not in the set. Since the knowledge graph may contain missingedges, the retrieved target set may be a subset of the ground truth. This in turncould result in entities being incorrectly used within the loss function (i.e. anincorrect closed-world assumption).However, this is not necessarily problematic. We define T to be the groundtruth target set of a query and T (cid:48) to be the retrieved target set (i.e. whendirectly querying the KG). Assuming the number of entities missing from T (cid:48) is considerably smaller than V − T , most entities that do not belong in T (cid:48) arealso not answers to the query (i.e. not in T ). This means that if we sample arelatively small subset of the inverse found target set ( V − T (cid:48) ) it will likely notcontain entities that are also in T .In the case where we need to be certain that our sample from V − T (cid:48) doesnot contain entities in T we could restrict our sampling process to entities whichcould never appear in T . This is possible for example, by sampling entities whichare incompatible with the domain and range of specific relations in a query(e.g. house entities will never appear in a has sibling(a,b) relation). Potentialdownsides of such methods include a potential slow down during learning or alimit in the model’s overall performance, as having very different entities in T and our sample from V − T (cid:48) could prevent our model from learning thedifferences between the two sets. On the other hand, if these two sets are verysimilar the model would be forced to uncover differences even when they are not
G Query Answering with Binary Classification 7 very apparent. In fact, it is often good practise to use so-called “hard” negativesamples, which are similar to entities in T (cid:48) . A better alternative for findingentities not in T would be using more advanced techniques as proposed in [16]. Another focal point where binary classification differs from ranking as a metric,is in the way performance is measured (e.g. F-score against Mean ReciprocalRank). On binary classification, a common performance measure would be theF-score, which is the harmonic mean between Precision and Recall, while in aranking setting we encounter the Mean Reciprocal Rank.While these metrics differ significantly, there are ways for them to relate. Thisinsight can be evident, considering that rankings could be turned in binaryclassifications, using a threshold. In particular, we notice that ranking metricstypically focus on having entities in T (cid:48) higher in the rank. As a result, havingmany high-ranking entities that are not in T (cid:48) is also penalised. Effectively thesemeasures then provide some notion of how well T (cid:48) and V − T (cid:48) can be separated.This means that in the case of a low ranking measure, the binary classificationcan also under-perform. Moreover, it could either result in low precision, recallor both, depending on where the threshold is placed among the ranking.Geometrically, there is also a correspondence between a ranking with a cutoffpoint and a system where all answer embeddings withing a given distance wouldbe included as answers. One could view a classifier with high precision and lowrecall as having an embedding with relatively small volume, while viewing a clas-sifier with high recall and low precision as having an embedding with relativelylarge volume instead. In this setting, the interpretation of a ranking measurewould be whether entities in T (cid:48) are closer to our geometric query embeddingthan entities not in T (cid:48) . This measure of closeness is defined via a distance met-ric (e.g. the L1 norm) and can be used in the loss function [15]. As discussed in section 2 an entity is a valid answer to a specific structured queryif it satisfies the query. The ultimate aim is to find the set of all valid answers,as entities in the Knowledge Graph, that satisfy the given query even when amissing edge in the KG is required for the binary predicates. As discussed, wecould either attempt to use a cut-off point in the ranking to obtain a binaryclassifier, or we could train the embedding model such that it indicates a volumein the embedded space that contains the answers. In this section we present afirst possible design of such a system to show the feasibility. We alter the earlierwork done on query2box [15] method in two ways. First, we do interpret theboundaries of the hyperrectangle used for the embedding as a bounding box. Allentities within the box are predicted answers to the query, while answers outsideare predicted to not be answers. Second, we do not use the embedding procedure van Bakel, Aleksiev, et al. proposed in query2box, but rather perform the embedding using the techniquedevised in MPQE.Now, we could choose to embed entities using points, as is done in otherquery embedding methods. Then, entities that get embedded inside the boxwould be seen as answers to the query, while points outside of it would be seenas non-answers. This is illustrated in figure 2
Fig. 2.
A small 2D query box embedding: Here there are three queries A , B and C ,and two entities v and w . In this case v is an answer to A and C , whilst w is only ananswer to A . (source [3]) But, as we will discuss in more detail in the following subsection, we can alsouse hyper-rectangles for these. The choice we make in the experiments in thispaper is to consider an entity, embedded as a box, to be valid answer to thequery if there is an intersection between the two boxes. This is also illustrated infigure 3, for the two-dimensional case. An alternative choice could be to consideran entity and answer in case the entity box is completely inside the query box.To formalize this, we operate on the embedding space R d . What we want isto describe an axis-aligned hyper-rectangle in this space. We do this by keepingtwo vectors, one to indicate the center of the box and one to indicate the offsetof the sides of the box. So, in the described model every entity v ∈ V has anembedding e v ∈ R d . Additionally an embedding for the query is defined thatmaps the full vector of the query: q ∈ R d .The boxes in R d corresponding to the 2 d -dimensional vectors are defined as p = ( Cen ( p ) , Off ( p )) ∈ R d : Box p = { v ∈ R d : Cen ( p ) − Off ( p ) (cid:22) v (cid:22) Cen ( p ) + Off ( p ) } , (3)where (cid:22) denotes element-wise inequality.Note that a completely analog definition could be made by keeping two ex-treme counterpoints of the box rather than a center and offset. G Query Answering with Binary Classification 9
Fig. 3.
A small 2D query and entity box embedding: Here there are three queries A , B and C , and one entity v . In this case v is an answer to A and B , but not to C . (source[3]) It was already mentioned in the previous section that we represent our entityembeddings with boxes, as well. This idea comes forward from the fact thatentities could play different roles in different contexts. For example, we couldhave a person who both works at a university, buy is also a member of a politicalparty. Having a single point to represent that person forces a query asking formembers of that political party and a query asking for people working at thatuniversity to overlap. If we instead use a box for the entity, the query embeddingsdo not have that additional problem. The issue is also illustrated in figure 4 and5. The nodes representing Alice and Bob are close to each other in the onecontext, but far away in the other one. In the embedding of the entities in fig. 5shows that with boxes it is possible to have the entities close to each other andfar away from each other at the same time. With the entities as boxes, we canhave it as an answer to two disjoint queries as illustrated in fig. 3.
In this section, we perform an evaluation of the system we discuss above. Notethat our goal is not to provide state-of-the-art results. Firstly, this is becausewhat we propose is just a proof of concept for an approximate embedding systemwhich can find a set of answers for a query. But, the main reason we cannot reallycompare with other systems is because they are evaluated with ranking metricsas discussed in section 3.
Figure 6 shows seven distinct query graph structures. We only consider thesestructures when training and testing our model for the query answering task.
Fig. 4.
Here Alice and Bob are closely related in context of a specific relations (1relation minimum), but they are not very closely related in other context (5 hopsminimum). (source [3])
Fig. 5.
Here Alice and Bob are have relatively close points (seen near the origin), butalso very distant points. (source [3])
Fig. 6.
Used query structures for evaluation on query answering. Black nodes corre-spond to anchor entities, hollow nodes are the variables in the query, and the graynodes represent the targets (answers) of the query. (source [4])G Query Answering with Binary Classification 11
These structures were originally proposed in GQE [9]. Each of these structuresstarts with actual entities from a graph (i.e. anchor entities) and ends with aset of target entities. Some of these structures are chains without any intersec-tions (e.g. B. ∃ A, B : knows(Alice , A ) ∧ is related to( A, B )), whilst other onlyhave intersections (e.g. B. ∃ B : knows(Alice , B ) ∧ is related to(Bob , B )) or evencombinations of both. Our goal is to train a model that finds the answer set of agiven query, using a query embedding. This is in contrast to other related work[15,9,4] as we want to be able to find multiple answers. As mentioned before,we could create such a set by embedding the query as box, thus getting a hardboundary for separating entities in and not in the target set. Datasets
While previous work [4,9] incorporated multiple datasets, our im-plementation has yet solely been tested on the AIFB dataset. This datasetis a knowledge graph of academic institution in which persons, organizations,projects, publications, and topics are the entities. Table 1 give some statisticsof this dataset and also for two more datasets often used for the evaluation ofapproximate query answering.
Table 1.
Statistics of the knowledge graphs that were used for training and evaluation.
AIFB MUTAG AMEntities 2,601 22,372 372,584Entity types 6 4 5Relations 39,436 81,332 1,193,402Relation types 49 8 19
Query Generation
To train our model we have to sample for query graphsfrom our dataset. This is done by initially sampling anchor nodes and relationswhich are later used to form graphs based on specific query patterns (fig. 6).After acquiring the anchor nodes and the relations connecting them, we canobtain the target set. Although this may appear straightforward, there are somecaveats. The biggest one is that some queries contain considerable sets of po-tential target entities (over 100,000 answers). Because we sample for edges firstthese particular graphs actually appear often.Luckily, for most query structures this was not the case, but specifically the2-chain and 3-chain query structures occasionally suffer from it. This is likelyexplained by the fact that knowledge graphs contain “hub nodes”, nodes witha very high degree, to which a plethora of other nodes connect via a certainrelation. Table 2 shows the average size of the target sets of sampled queriesfor the aforementioned datasets. One interesting thing to note is that for the
AM dataset the 3-chain-inter structure actually had the largest average targetset. This could indicate that this problem is indeed very graph-dependent. Sincethis is a problem with the AIFB dataset, we limit the query target sets to amaximum of 100 answers.We also sample for entities not in the target set to be used as negative samplesduring training. For the query structures that contain an intersection we incor-porate hard negative samples by finding entities that would have been in thetarget set if the conjunctive intersections were to be relaxed to disjunctions.
Table 2.
Average number of multiple answers to different queries structures, acrossthe used datasets. (results were earlier reported in [1])
AIFB MUTAG AMStructure Train Test Train Test Train Test1-chain 3.4 1.2 1.9 1.1 1.2 1.02-chain 34.5 6.4 13.4 4.7 10.2 3.53-chain
Evaluation
In order to test whether the model is actually able to find answersto queries that involve edges which are not in the graph, careful preparation ofour data splits was necessary. We started by our original graph and marked 10%of the edges to be removed (they are still there at this stage). Then, we samplethe graph for the query patterns. If the sample makes use of any edge markedas removed, it will be added to either the validation set or the test set (10/90split). If the sample contains no such marked edge, then we put it in the trainingset. This way, we end up with validation and test queries that make use of atleast one edge that is not in the graph seen during training.Post sampling, we end up with around 2 million targets and the correspondingquery graphs to be used in the training set. For the validation set we used about30,000 targets worth of queries and for the test set we will had approximately300,000 targets worth of query graphs. The validation set is also used to performearly stopping in case specific conditions were not met.Since our method uses boxes, which allow for binary classification, we reportour model’s performance in the form of a confusion matrix (see figure 7). Giventhe fact that our entities are also boxes, we have more freedom to choose whenan entity is considered an answer.This is because entities now inhabit more space than a single point whichallows for partial overlap with query boxes. In order to allow flexibility we have
G Query Answering with Binary Classification 13 decided that an entity is considered an answer to a query if its box representationoverlaps with the box representation of the respective query box. Naturally, othermore strict conditions could be applied such as requiring full overlap or define afraction based threshold (e.g. requiring at least 50% overlap). We expect theseconditions to change based on the potential downstream task.
Fig. 7.
Model of the confusion matrix used for evaluation of the results, the empty boxis representation of a query, the black and the gray box are respectively a valid and ainvalid answer to the query. (source [1])
QueryEmbeddingAggregateEntityEmbeddings MPQE x n+1 x n+2 ...x n+m-1 x n+m x x ...x n-1 x n Center Offset Loss
Fig. 8.
The MPQB model used in this proof of concept. (adapted from a figure in [3])
Model
Our model has the same basic functionality as the MPQE [4] model.MPQE is used as an embedding component, but the input and output are in- terpreted as boxes. MPQE first performs several steps of message passing usingan R-GCN architecture after which the node states are aggregated to form thequery embedding. With this query embedding a loss function is evaluated whichis used as a signal (using SGD) to update the embeddings and weights in thenetwork. For the aggregation operation we have several options (
SUM , MAX , TM , MLP ) at the end of our model. We test our model with some of thesedifferent aggregation functions.Since we train an embedding matrix (as opposed to having a latent em-bedding to start with) we need to initialize it. We do this by sampling the 32dimensional center vectors from a uniform distribution between 0 and 10, whilstsampling the 32 dimensional offset vectors from a unit Gaussian with a mean 3.For TM aggregation, the MPQE model uses 3 layers; the TM aggregationfunction requires a number of message passing steps equal to the query diameter,in our case 3. For the MLP aggregation function we applied a two layer fully-connected MLP. As for the non-linearities in our model, we used the ReLUfunction. To update the parameters of the model we used Adam optimizer witha learning rate of 0.01.Our code base is based on PyTorch. In particular, we made use of the libraryPyTorch Geometric [7], which is a PyTorch extension specialised for graph-basedmodels. While there are potential baselines to consider [9,4], they are not suitablefor our work. This happens because we perform a binary classification as opposedto ranking-based methods. To our knowledge there have not been any relatedwork that performed binary classification in the context of approximate graphquerying. In the area of link prediction, we do find some work, like the earlywork on Neural Tensor Networks [19] and a more recent one which looks attriple classification [6]. This did not prove to be a major concern, as our maingoal was not to achieve state-of-the-art results, but rather explore whether thisdirection of research may prove worthwhile.
After having trained the MPQB model for over 200,000 iterations it appeared tostill not have converged. After this amount of iterations the query boxes seemedto not overlap with any target boxes (i.e. no entities in T (cid:48) were returned). Apartfrom training the model for longer and on multiple epochs, there are some othersettings that could still be experimented with. For example, how many samplesare in each epoch (less samples allow for training on more epochs), whether weuse T (cid:48) fully during train or use a subset, and how many entities should be in oursample from V − T (cid:48) . The latter two settings also influence how many distinctqueries we could train on within a given time span. In may be worth noting thatprevious works [9,4,15] train using single positive samples. While we want tofocus on answering queries with multiple answers, we do not necessarily need totrain on multiple answers. In theory, if a method can produce a good ranking,it should also be able to produce a good classification, given that the optimalthresholds for these rankings could be found.Since we do not have direct result in a manner we would have liked, we will
G Query Answering with Binary Classification 15 instead analyse the trained models to see if there are relevant insights to befound. For this we looked at models using different aggregation functions, trainedon the AIFB dataset.While we have no intersections between query boxes and target boxes, we couldstill look whether the target boxes (from T (cid:48) ) appear relatively close to the entityboxes, when compared to the box representations of entities in V − T (cid:48) . Thiseffectively provides some measure as to whether the produced rankings are good.Table 3 shows these results. While these scores may not indicate state-of-the-artresults, they do seem to suggest that the model did at least produce decent non-trivial rankings using the SUM and TM aggregators. This could suggest thatfurther research is indeed in order. The fact that TM outperformed SUM is notsurprising considering that it is a more involved method that also takes querydiameter into account. This result is also in line with the findings in [4]. A moresurprising result is that the MLP method did not seem to perform well at all.This could be a result of a faulty implementation, or an implementation thatsimply does not work for boxes as is. Overall, the results seem promising.
Table 3.
Percentage (%) of answers embedded closer to the query box compared to anon answer, with regard to the query structure, using different aggregation function.Tested on AIFB dataset. (results were earlier reported in [1])
AIFBStructure SUM TM MLP1-chain 67.48
In this work, we looked critically at the currently prevailing evaluation strat-egy for approximate complex structured query algorithms for knowledge graphs.Typically, these systems take a query as an input and produce a ranking ofall entities in the KG as an output. The performance of these systems is thandetermined using metrics typically used in information retrieval.What we propose is to augment the current evaluations by also requiringthese systems to produce a binary classification of the nodes into a class of answers and one of non-answers. This is needed because many applications cansimply not work with a ranking and need a fixed set of answers to work with.As a first proof of concept, we have adapted ideas from MPQE and query2Box,and created an embedding algorithm that represents the queries and the entitiesas axis-aligned hyper-rectangles. We noticed that the performance of this systemis pretty low, and expect that future works can heavily improve upon this firstattempt.As future research directions, we see a need to expand our experiments toinclude other query types (disjunctions, negations, filters, etc. ), in order to showthe generalizability of our approach. This will, however, require new representa-tion for the volumes as these operations are not possible if we would stay withjust boxes. For example, the negation of a box, would no longer be a box.Moreover, we it needs to be investigated how our method can be applied ondifferent kinds of graphs. This will give us insights as to what changes need tobe made in terms of training data (via query generation) as well as the effects onmodel performance. Also, it seems worth experimenting with different geometricrepresentations for the parts of the query (anchor, variables and targets). Finally,since our experiments were relatively small-scale, further research could also startby simply experimenting with different settings for our current architecture.
References
1. Aleksiev, T.: Answering approximated graph queries, embedding the queries andentities as boxes (2020), BSc. thesis, Computer Science, Vrije Universiteit Amster-dam, Supervised by Cochez, M.2. Arakelyan, E., Daza, D., Minervini, P., Cochez, M.: Complex query answering withneural link predictors. In: International Conference on Learning Representations(2021), https://openreview.net/forum?id=Mos9F9kDwkz
3. van Bakel, R.: Box R-GCN: Structured query answering using box embeddingsfor entities and queries (2020), BSc. thesis, Computer Science, Vrije UniversiteitAmsterdam, Supervised by Cochez, M.4. Daza, D., Cochez, M.: Message passing query embedding. In: ICML Workshop- Graph Representation Learning and Beyond (2020), https://arxiv.org/abs/2002.02406
5. Defferrard, M., Bresson, X., Vandergheynst, P.: Convolutional neural networks ongraphs with fast localized spectral filtering. In: Advances in neural informationprocessing systems. pp. 3844–3852 (2016)6. Dong, T., Wang, Z., Li, J., Bauckhage, C., Cremers, A.B.: Triple classification usingregions and fine-grained entity typing. In: Proceedings of the AAAI Conference onArtificial Intelligence. vol. 33, pp. 77–85 (2019)7. Fey, M., Lenssen, J.E.: Fast graph representation learning with PyTorch Geometric.arXiv preprint arXiv:1903.02428 (2019)8. Gilmer, J., Schoenholz, S.S., Riley, P.F., Vinyals, O., Dahl, G.E.: Neural messagepassing for quantum chemistry. In: Proceedings of the 34th International Con-ference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August2017. pp. 1263–1272 (2017)G Query Answering with Binary Classification 179. Hamilton, W., Bajaj, P., Zitnik, M., Jurafsky, D., Leskovec, J.: Embed-ding logical queries on knowledge graphs. In: Bengio, S., Wallach, H.,Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R. (eds.) Advancesin Neural Information Processing Systems. vol. 31, pp. 2026–2037. CurranAssociates, Inc. (2018), https://proceedings.neurips.cc/paper/2018/file/ef50c335cca9f340bde656363ebd02fd-Paper.pdf
10. Hogan, A., Blomqvist, E., Cochez, M., d’Amato, C., de Melo, G., Gutierrez, C.,Gayo, J.E.L., Kirrane, S., Neumaier, S., Polleres, A., et al.: Knowledge Graphs.arXiv preprint arXiv:2003.02320 (2020)11. Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutionalnetworks. arXiv preprint arXiv:1609.02907 (2016)12. van Krieken, E., Acar, E., van Harmelen, F.: Analyzing Differentiable FuzzyImplications. In: Proceedings of the 17th International Conference on Prin-ciples of Knowledge Representation and Reasoning. pp. 893–903 (9 2020).https://doi.org/10.24963/kr.2020/92, https://doi.org/10.24963/kr.2020/92
13. Minervini, P., Demeester, T., Rockt¨aschel, T., Riedel, S.: Adversarial sets for reg-ularising neural link predictors. In: UAI. AUAI Press (2017)14. Noy, N., Gao, Y., Jain, A., Narayanan, A., Patterson, A., Taylor, J.: Industry-scaleknowledge graphs: lessons and challenges. Communications of the ACM (8), 36–43 (2019)15. Ren, H., Hu, W., Leskovec, J.: Query2box: Reasoning over knowledge graphs invector space using box embeddings (2020)16. Safavi, T., Koutra, D., Meij, E.: Evaluating the calibration of knowledge graphembeddings for trustworthy link prediction (2020)17. Schlichtkrull, M., Kipf, T.N., Bloem, P., Van Den Berg, R., Titov, I., Welling,M.: Modeling relational data with graph convolutional networks. In: EuropeanSemantic Web Conference. pp. 593–607. Springer (2018)18. Serafini, L., d’Avila Garcez, A.S.: Logic tensor networks: Deep learning and log-ical reasoning from data and knowledge. CoRR abs/1606.04422 (2016), http://arxiv.org/abs/1606.04422
19. Socher, R., Chen, D., Manning, C.D., Ng, A.: Reasoning with neuraltensor networks for knowledge base completion. In: Burges, C.J.C., Bot-tou, L., Welling, M., Ghahramani, Z., Weinberger, K.Q. (eds.) Advancesin Neural Information Processing Systems. vol. 26, pp. 926–934. CurranAssociates, Inc. (2013), https://proceedings.neurips.cc/paper/2013/file/b337e84de8752b27eda3a12363109e80-Paper.pdfhttps://proceedings.neurips.cc/paper/2013/file/b337e84de8752b27eda3a12363109e80-Paper.pdf