[PDF] A Comparative Analysis of Knowledge Graph Query Performance

Abstract

As Knowledge Graphs (KGs) continue to gain widespread momentum for use in different domains, storing the relevant KG content and efficiently executing queries over them are becoming increasingly important. A range of Data Management Systems (DMSs) have been employed to process KGs. This paper aims to provide an in-depth analysis of query performance across diverse DMSs and KG query types. Our aim is to provide a fine-grained, comparative analysis of four major DMS types, namely, row-, column-, graph-, and document-stores, against major query types, namely, subject-subject, subject-object, tree-like, and optional joins. In particular, we analyzed the performance of row-store Virtuoso, column-store Virtuoso, Blazegraph (i.e., graph-store), and MongoDB (i.e., document-store) using five well-known benchmarks, namely, BSBM, WatDiv, FishMark, BowlognaBench, and BioBench-Allie. Our results show that no single DMS displays superior query performance across the four query types. In particular, row- and column-store Virtuoso are a factor of 3-8 faster for tree-like joins, Blazegraph performs around one order of magnitude faster for subject-object joins, and MongoDB performs over one order of magnitude faster for high-selective queries.

Full PDF

AA Comparative Analysis of Knowledge GraphQuery Performance

Masoud Salehpour and Joseph G. Davis

University of Sydney

Abstract.

As Knowledge Graphs (KGs) continue to gain widespreadmomentum for use in diﬀerent domains, storing the relevant KG contentand eﬃciently executing queries over them are becoming increasingly im-portant. A range of Data Management Systems (DMSs) have been em-ployed to process KGs. This paper aims to provide an in-depth analysis ofquery performance across diverse DMSs and KG query types. Our aim isto provide a ﬁne-grained, comparative analysis of four major DMS types,namely, row-, column-, graph-, and document-stores, against major querytypes, namely, subject-subject, subject-object, tree-like, and optionaljoins. In particular, we analyzed the performance of row-store Virtu-oso, column-store Virtuoso, Blazegraph (i.e., graph-store), and MongoDB(i.e., document-store) using ﬁve well-known benchmarks, namely, BSBM,WatDiv, FishMark, BowlognaBench, and BioBench-Allie. Our resultsshow that no single DMS displays superior query performance acrossthe four query types. In particular, row- and column-store Virtuoso area factor of 3-8 faster for tree-like joins, Blazegraph performs around oneorder of magnitude faster for subject-object joins, and MongoDB per-forms over one order of magnitude faster for high-selective queries.

Keywords:

Knowledge Graph · query performance · SPARQL queries.

The term Knowledge Graph (KG) was used by Google in 2012, referring tocollecting information about real-world entities and their inter-relationships tofacilitate the exploitation of semantics for searching the Web. From a broaderperspective, any labeled directed graph-based representation of knowledge in aparticular domain can be called a KG [14]. For example, the term KG has beenused to refer to Semantic Web Linked Datasets such as DBpedia or YAGO. Inthe recent years, many organizations such as Amazon, Facebook, Microsoft, andAlibaba have created large KGs for diﬀerent purposes ranging from semanticsearch , recommendations , reasoning , and data integration . However, unlockingKGs’ full potential in response to the growing deployment requires data frame-works to represent KG content and data platforms that can eﬃciently store thecontent and execute queries over them.For the data framework, the World Wide Web Consortium (W3C) has recom-mended the Resource Description Framework (RDF) as a directed and labeled a r X i v : . [ c s . D B ] A p r Masoud Salehpour and Joseph G. Davis graph-like structure for representation , integration , and exchange of the contentof a KG using a large set of triples of the form < subject predicate object > .RDF oﬀers a simple representation in which subjects and objects of triples arevertices of a graph that are connected by predicates as labeled edges. This sim-plicity can help provide an intuitive conceptualization of real-world entities andtheir inter-relationships. It can also represent diverse KG content ranging from structured to unstructured . However, this ﬂexibility as well as the absence of an explicit schema and the heterogeneity of KG content pose a challenge to DataManagement Systems (DMSs) for querying KGs eﬃciently since DMSs typicallycannot make any priori assumptions about the structure of the KG content [8].For the data platforms, DMS designers have employed a variety of designchoices and architectures to tackle the above-mentioned challenges for queryingKGs. For example, a variety of exhaustive indexing strategies, compression tech-niques, and dictionary encoding (i.e., to keep space requirements reasonable forexcessive indexing) have been implemented by major native RDF-stores such asmultiple bitmap indexes of Virtuoso or dictionary-based lexical values encod-ing of Blazegraph. A number of research prototypes have also been presented.For instance, [1] proposed a workload-adaptive and self-tuning RDF-store us-ing physical clustering of the underlying data and [12] followed a RISC-style(reduced instruction set) architecture to leverage multiple query processing al-gorithms and optimization. However, the problem of storing and querying KGseﬃciently continues to challenge DMS designers.In addition to the design choices and architectures of DMSs, KG query per-formance is also aﬀected by the diversity of SPARQL query types [2]. Whilethe importance of these factors has been recognized, our understanding of the comparative performance of diﬀerent types of queries across the major DMStypes is somewhat limited. In this paper, we explore this problem, includingthe interactions between a DMS and query types . We provide a ﬁne-grained,comparative analysis of four major DMS types, namely, row-, column-, graph-, and document-stores, against major types of KG queries, namely, subject-subject (aka, star-shape), subject-object (aka, chain-like or path), tree-like (aka,combined), and optional (aka, left-outer-join or OPT clauses) join queries. Theperformance of row-, column-, and graph-stores for executing queries has beenstudied in [2] based on their widespread use for processing RDF data. A widelyaccepted typology of KG queries is yet to emerge. At this stage query types suchas subject-subject, subject-object, tree-like, and optional queries have been ana-lyzed in previous research. Query types such as subject-subject, subject-object,and tree-like have been the focus of experiments in [19]. [3] has highlighted theimportance of optional queries.For our experiment, we selected row-store Virtuoso, column-store Virtuoso,Blazegraph, and MongoDB as representative DMSs for row-, column-, graph-,and document-store, respectively. We loaded ﬁve well-known benchmark datasets,namely, BSBM, WatDiv, FishMark, BioBench-Allie, and BowlognaBench intothe DMSs separately. The benchmark queries were executed over each of theDMSs separately and query execution times computed to analyze the eﬀects

Comparative Analysis of Knowledge Graph Query Performance 3

Fig. 1: An example of a simple Knowledge Graph describing the “OperaHouse”,a heritage site located in Sydney.of query types on the performance of diﬀerent DMS types. Our contributionsinclude: – Comparative performance analysis and experimental evaluation of row-, column-,graph-, and document-stores in supporting the diﬀerent SPARQL querytypes – Providing explanations for the observed strengths and limitations of thediﬀerent DMSs depending on the types of queries – Communicating clear scientiﬁc and practical guidelines to researchers andpractitioners through summarizing the lessons learned from our journeyThe remainder of this paper is organized as follows. In Section 2, we providesome preliminary information about KG query types. Section 3 presents ourexperimental setup including the KG benchmark characteristics, computationalenvironment, DMSs conﬁguration, indexing, and data loading process. In Sec-tion 4, results of the query processing and related analyses are presented. Wesummarize the lessons learned from our research and discuss some of the limita-tions in Section 5. Section 6 highlights related work. We present our conclusionsand future work in Section 7.

In this section, we present some preliminary information about diﬀerent querytypes using the “OperaHouse” KG example depicted in Fig. 1. The content ofthis KG can be represented by the following triples : OperaHouse located_in "Sydney"OperaHouse instance_of "landmark"OperaHouse instance_of "heritage site"OperaHouse instance_of "tourist attraction" We use human-readable names in our examples in this paper. Masoud Salehpour and Joseph G. Davis

OperaHouse style "expressionist"OperaHouse opening_date "20 Oct. 1973"Sydney located_in "Australia"Sydney instance_of "city"Sydney instance_of "capital"Sydney instance_of "metropolis"

An example of a query is then given below. It asks for the subject “Op-eraHouse’s” architectural style name from the KG in Fig. 1. “?styleName” is avariable to return the associated value (i.e., “expressionist”) as the result. Queriesmay contain a set of triple patterns such as “OperaHouse style ?styleName” inwhich the subject, predicate, and/or object can be a variable.

SELECT ?styleNameWHERE {OperaHouse style ?styleName .}

Each triple pattern typically returns a subgraph. This resultant subgraphcan be further joined with the results of other triple patterns to return the ﬁnalresultset. In practice, there are four major types of join queries: (i) Subject-subject joins (aka, star-like), (ii) subject-object joins (aka, chain-like or a path),(iii) tree-like (i.e., combination of subject-subject and subject-object joins), and(iv) optional joins (aka, left-outer-join or OPT clauses).

Subject-subject joins.

A subject-subject join is performed by a DMS whena KG query has at least two triple patterns such that the predicate and objectof each triple pattern is a given value (or a variable), but the subjects of bothtriple patterns are replaced by the same variable. For example, the followingquery looks for all subjects of the KG in Fig. 1 that are located in “Sydney” andtheir style is “expressionist” (the result will be “OperaHouse”).

SELECT ?xWHERE {?x style "expressionist" .?x located_in "Sydney" .}

Subject-object joins.

A subject-object join is performed by a DMS whena KG query has at least two triple patterns such that the subject of one of thetriple patterns and the object of the other triple pattern are replaced by thesame variable. For example, the following query looks for all subjects that arelocated within Australian cities (the result will be “OperaHouse”).

SELECT ?yWHERE {?x located_in "Australia" .?y located_in ?x .}

Comparative Analysis of Knowledge Graph Query Performance 5

Tree-like joins.

A tree-like joins consists of a combination of subject-subjectand subject-object joins. For example, the following query requires a tree-likejoin to look for the opening date of “OperaHouse” (the result will be “20 Oct.1973”).

SELECT ?yWHERE {?x opening_date ?y .?x located_in ?z .?z instance_of "capital" .?z instance_of "metropolis" .}

Optional Joins.

Queries return resultsets only when the entire query pat-tern matches the content of the KG. However, optional joins allow KG queries toreturn a resultset even if the optional part of the query is not matched since com-pleteness and adherence of KGs’ content to their formal ontology speciﬁcation isnot always enforced. This makes optional join a suitable tool for querying KGs.For example, the following query using optional join (in addition to a subject-subject join) to return “OperaHouse” as one of Sydney’s tourist attractions.

SELECT ?xWHERE {?x instance_of "tourist attraction" .?x located_in "Sydney" .OPTIONAL {?x instance_of "zoo" .}}

Selectivity of Queries.

As above mentioned, each KG query contains a setof triple patterns where a triple pattern is a structure of three components whichmaybe concrete (i.e. bound) or variable (i.e. unbound). Sets of triple patternsspecify the complexity of the access to the underlying data. When the number ofstored triples satisfying sets of triple patterns’ conditions is large (i.e., as com-pared to the total number of stored triples), the corresponding query consideredas low-selective [16]. In this regard, each query type can also be either high-selective or low-selective depending on the number of stored triples satisfying itstriple patterns’ conditions.

In this section, we present our experimental setup including the KG benchmarkcharacteristics, computational environment, DMSs’ conﬁguration, indexing, anddata loading process.

Masoud Salehpour and Joseph G. DavisBenchmark Scale (nominal)

Table 1: Statistics of the Benchmark datasets

We used ﬁve well-known benchmarks in this research. These are publicly avail-able datasets along with a collection of queries. These benchmarks are: mBerlinSPARQL Benchmark (BSBM) [5], Waterloo SPARQL Diversity Test Suite (WatDiv) [2], FishMark [4], BowlognaBench [7], and BioBench-Allie [18]. Wat-Div and BSBM follow speciﬁc rules that allow us to scale the datasets to arbitrarysizes using their scale factors while other datasets are of ﬁxed siz. Table 1 showsthe statistical information related to the above benchmarks. The RDF represen-tations of these benchmarks are available in diﬀerent formats such as N-Triples,Turtle, and XML. We used the RDF/N-Triples format. However, to load theminto a document-store like MongoDB, we had to convert them to the JSON-LD format. We performed the conversion using a parser designed and developed aspart of this project .We ran the benchmark queries against the corresponding datasets using thefour DMSs. We selected twelve queries across the benchmarks that were rep-resentative of the major four query types. These queries provide varying de-grees of selectivity and complexity. The selected subject-subject join queriesare: Query 5 from FishMark (

FishMark-Q5 ), Query 7 from BowlognaBench(

BowlognaBench-Q7 ), and Query 7 from WatDiv (

WatDiv-Q7 ). The se-lected subject-object join queries are: Query 2 from BioBench-Allie (

BioBench-Allie-Q2 ), Query 21 from WatDiv (

WatDiv-Q21 ), and Query 22 from Wat-Div (

WatDiv-Q22 ) and the selected tree-like join queries are: Query 1 fromBioBench-Allie (

BioBench-Allie-Q1 ), Query 14 from BowlognaBench (

BowlognaBench-Q14 ), and Query 19 from FishMark (

FishMark-Q19 ). Finally, the selected optional join queries are: Query 2 from BSBM (

BSBM-Q2 ), Query 4 fromBSBM (

BSBM-Q4 ), and Query 2 from FishMark (

FishMark-Q2 ). For Wat- http://wifo5-03.informatik.uni-mannheim.de/bizer/berlinsparqlbenchmark/ https://dsg.uwaterloo.ca/watdiv/ http://allie.dbcls.jp/ The source code is available through https://github.com/oursubmission/ESWC These queries are available through: https://github.com/oursubmission/ESWC

Comparative Analysis of Knowledge Graph Query Performance 7

Div and BSBM benchmarks, corresponding query execution times over KGs with100M triples are reported in this paper while results for 10M and 1000M onlyavailable online due to space constraint. Our benchmark system is a Virtual Machine(VM) instance with a 2.3GHz AMD Processor, running Ubuntu Linux (ker-nel version: 4.4.0-170-generic), with 48GB of main memory, 16 vcores, 512K L2cache, 5TB instance storage capacity. The VM cache read is roughly 2799.45MB/secand the buﬀer read is roughly 35.85MB/sec (i.e., the output of the “hdparm -Tt”Linux command). The operating system is set with almost no “soft/hard” limiton the ﬁle size, CPU time, virtual memory, locked-in-memory size, open ﬁles,processes/threads, and memory size using Linux “ulimit” settings.

Data Management Systems (DMSs).

We chose four diﬀerent DMSs: (1)Row-store Virtuoso (Open Source Edition, version 06.01.3127), (2) Column-storeVirtuoso (Open Source Edition, Version 07.20.3230–commit 4a668a5), (3) Blaze-graph (Open Source Edition, version 2.1.5–commit 3122706), and (4) MongoDB(community edition, version: 4.0.9). Conﬁguration of row- and column-store Virtuoso.

We conﬁgured bothof them based on the vendor’s oﬃcial recommendations. For example, weconﬁgured the Virtuoso process to use the main memory and the storage diskeﬀectively by setting “NumberOfBuﬀers” to “4,000,000”, “MaxDirtyBuﬀers” to“3,000,000”, and “MaxCheckpointRemap” to “a quarter” of the database size asrecommended. We also used the latest version of GNU packages that are nec-essary to build column-store Virtuoso (e.g. GNU gpref 3.0.4, libtool 2.4.6, ﬂex2.6.0, Bison 3.0.4, and Awk 4.1.3).

Conﬁguration of Blazegraph.

We conﬁgured it based on the vendor’s oﬃcialrecommendations as well. For example, we turned oﬀ all inference, truth main-tenance, statement identiﬁers, and the free text index in our experiment sincereasoning eﬃciency was not part of our research focus in this paper. Conﬁguration of MongoDB.

We used its default settings. We set its level of proﬁling to “2” to log the data for all query-related operations for precise anddetailed query execution time extraction.

Indexing of Virtuoso.

We did not change the default indexing scheme of

Virtuoso (both row- and column-store). As highlighted in the oﬃcial website,“alternate indexing schemes are possible but will not be generally needed”. More speciﬁcally, Virtuoso’s data modeling is based on a relational table withthree columns for S, P, and O (i.e., S: Subject, P: Predicate, and O: Object) These queries are available through: https://github.com/oursubmission/ESWC Previously known as Bigdata DB. http://vos.openlinksw.com/owiki/wiki/VOS/VirtRDFPerformanceTuning https://wiki.blazegraph.com/wiki/index.php/PerformanceOptimization http://docs.openlinksw.com/virtuoso/rdfperfrdfscheme In the case of loading named graphs, it adds another column for the context, calledC. Masoud Salehpour and Joseph G. Davis and carrying multiple indexes over that table to provide a number of diﬀerentaccess paths. Most recently, column-store Virtuoso added columnar projectionsto minimize the on-disk footprint associated with RDF data storage. Virtuoso(both row- and column-store) creates the following compound indexes by defaultfor the loaded KG: PSO, PO, SP, and OP.

Indexing of Blazegraph.

As recommended in its oﬃcial website, we didnot change its default indexing schema. In Blazegraph, indexes are based on“B+Trees” data structure. Blazegraph typically uses the following three indexesfor triples modes: SPO, POS, and OSP. For normal use cases, these indexesare laid out on variable sized pages. These index pages are read from the back-ing store and load in the main memory on demand (i.e., into the Java heap).However, Blazegraph takes advantage of a variety of data structures to executequeries when stored KG content is loaded in the main memory. For example, theunderlying data structure is retained by a mixture of a ring buﬀers on the stackalongside a native memory cache for buﬀering writes to reduce write applicationeﬀects. MongoDB Storage Layouts.

We did not change its default storage enginewhich is a key/value store, namely, WiredTiger, to store JSON documents. MongoDB usually assigns an arbitrary (and unique) identiﬁer to each JSONdocument as a key and considers the document as a value. It uses B-Trees tocreate indexes on the contents of each JSON document.

Indexing of MongoDB.

We created indexes on those name/value pairs of theJSON-LD that were representatives of subjects and predicates.

Loading the benchmark KG.

We loaded the RDF/N-Triples format of bench-marks into

Virtuoso (row- and column-store) by using its native bulk loaderfunction (i.e., “ld_dir”). To load the KGs into

Blazegraph , we used its native“DataLoader” utility . We loaded KGs into MongoDB using its native tool,called “mongoimport”. Shutdown store, clear caches, restart store.

We measured the query execu-tion times in our evaluation. This is an end-to-end time computed from the timeof query submission to the time when the result is outputted. After the executionof each query, we carefully checked to ensure that the output results are correctand exactly the same across diﬀerent DMSs. The query times for both cold- andwarm-run (aka, cold and warm cache) are reported. For cold-run we dropped theﬁle systems caches using /bin/sync , echo 3 > /proc/sys/vm/ drop_caches ,and swapoff -a && swapon -a commands. For fairness, the warm-run querytimes reported for each DMS are averaged over 5 successive runs (with almostno delay in between). https://wiki.blazegraph.com/wiki/index.php/PerformanceOptimization MongoDB uses the binary equivalent of each JSON document (i.e., BSON) for stor-age, in which the structure of each document remained unchanged. https://wiki.blazegraph.com/wiki/index.php/Bulk_Data_Load Geometric mean is used. Comparative Analysis of Knowledge Graph Query Performance 9(a) Subject-subject join (warm)(b) Subject-subject join (cold)

Fig. 2: Impacts of subject-subject join queries on the DMSs (cold- and warm-run). X axis shows DMSs and Y axis shows the execution time of each query inmilliseconds (using log scale). The query execution times are shown in Fig. 2 inwhich X axis represents the DMSs and the Y axis shows the execution timesof queries, namely, FishMark-Q5 , BowlognaBench-Q7 , and

WatDiv-Q7 inmilliseconds (using log scale). Fig. 2 shows that MongoDB runs this type ofqueries over one order of magnitude faster than the other DMSs when queriesare high-selective. For example, MongoDB executed

FishMArk-Q5 in 2.19 mil-liseconds (warm-run) while Blazegraph, column-store Virtuoso, and row-storeVirtuoso executed the same query in 394.24, 89543.81, and 29045.86 millisec-onds, respectively. However, our results show that Blazegraph performs at least2x faster than other DMSs when subject-subject join queries are low-selective(e.g.,

WatDiv-Q7 ). In Fig. 2, the diﬀerences between cold- and warm-run showthat Virtuoso (row- and column-store) can take advantage of caching tech-niques more than other DMSs. For example, Virtuoso (row and column) executes

BowlognaBench-Q7 in over 1500 milliseconds (cold-run) while its executiontime is around 150 milliseconds in a warm-run.

Subject-object Joins.

The query execution times for the selected subject-object join queries is shown in Fig. 3 in which X axis shows the DMSs andthe Y axis shows the execution time of queries, namely, BioBench-Allie-Q2 , WatDiv-Q21 , and

WatDiv-Q22 in milliseconds (using log scale). AlthoughMongoDB executed

BioBench-Aliie2-Q2 , which is a high-selective query, over2 orders of magnitude faster than other DMSs, it could not ﬁnish the execution

Fig. 3: Impacts of subject-object join queries on the DMSs (cold- and warm-run).of

WatDiv-Q21 and

WatDiv-Q22 within the given time-out period of 50,000milliseconds. The complexity and non-selectivity of these two queries may havecontributed to the unsuccessful execution over MongoDB. However, Fig. 3 showsthat other DMSs performed comparably. For instance,

WatDiv-Q21 executedover Blazegraph in around 570 milliseconds (warm-run) where this executiontime is equal to 118.38 and 374.6 milliseconds for column-store Virtuoso androw-store Virtuoso, respectively.

Tree-like Joins

The query execution time for the selected tree-like joinqueries is shown in Fig. 4. This ﬁgure shows that row- and column-store Vir-tuoso performed similarly for warm-run execution of

BioBench-Allie-Q1 and

FishMark-Q19 while Blazegraph is around 5x slower. MongoDB appeared tobe the slowest for warm-run execution of

BioBench-Allie-Q1 while its perfor-mance is comparable with Blazegraph for

FishMark-Q19 . BowlognaBench-Q14 executed around 2 orders of magnitude faster using MongoDB, probablybecause it is a high-selective tree-like join query. The comparison between cold-and warm-run execution of

FishMark-Q19 can also give rise to the importanceof the role that caching techniques play in query performance where MongoDBis the fastest in cold-run, but in warm-run, it is almost the slowest (after Blaze-graph).

Optional Joins.

The query execution time for the selected optional joinqueries is shown in Fig. 5 in which MongoDB executed them faster than otherDMSs. High-selectivity of the selected queries may have been an important fac-tor in MongoDB’s performance advantage. Row-store Virtuoso was the slowestacross others while column-store Virtuoso performed over 3x faster than Blaze-graph to run these queries (warm-run). However, in the cold-run, aside from the

Comparative Analysis of Knowledge Graph Query Performance 11(a) Tree-like join (warm)(b) Tree-like join (cold)

Fig. 4: Impacts of tree-like join queries on the DMSs (cold- and warm-run).performance advantage of MongoDB, Blazegraph performed slightly better thanothers especially for executing

BSBM-Q2 . Our results showed that MongoDB can execute thisquery type over one order of magnitude faster, especially for queries with highselectivity. Virtuoso (both row- and column-store) and Blazegraph typically ex-ecute subject-subject join by scanning indexes for each triple pattern separately.The retrieved result of each triple pattern is kept in the main memory as an in-termediary result. These DMSs join diﬀerent intermediary results to return theﬁnal result. Virtuoso (both row- and column-store) and Blazegraph typically usea hash join algorithm for executing subject-subject joins over the intermediaryresults. However, in MongoDB, all triples with the same subject have appearedin a single JSON document and the joining of triples with the same subject isequivalent to an index-based look-up querying of a given subject. Therefore, weobserved better performance from MongoDB for high-selective subject-subjectjoin queries.

Subject-object Joins.

We observed that Blazegraph oﬀers a signiﬁcant per-formance improvement on the cold-run execution of subject-object join queries.Merge join is known to be an eﬃcient algorithm to be implemented by DMSs forrunning subject-object join over intermediary results, after scanning indexes, toreturn the ﬁnal result [10]. To the best of our knowledge, none of the DMSshave implemented the merge join as a part of their query processing engines. As Please note that [10] did not use the exact term “subject-object join”, instead it refersto this query type by its deﬁnition2 Masoud Salehpour and Joseph G. Davis(a) Optional join (warm)(b) Optional join (cold)

Fig. 5: Impacts of optional join queries on the DMSs (cold- and warm-run).a result, these DMSs use Index Nested Loop join algorithm to support subject-object queries. In our experiments, the faster query execution time of Blazegraph(cold-run) for this query type may stem from the use of B + -tree-based indexnested loop join which is more read-optimal as compared to bitmap index-basedof both row- and column-store Virtuoso. In addition, Blazegraph uses cardinal-ity estimation to predict the size of the intermediary results of queries to ﬁnd agood join ordering. This estimation requires dynamic programming techniquesand the building of statistical summaries such as histograms. Such cardinalityestimation has a signiﬁcant performance eﬀect on the execution time of low-selective subject-object join queries. Therefore, we observed better performancefrom Blazegraph as compared to row- and column-store Virtuoso for cold-runexecution of subject-object join queries. Tree-like Joins.

Our results showed that Virtuoso (specially column-store)executed tree-like join queries faster. Technically, a tree-like join can be consid-ered as a combination of subject-subject and subject-object joins. The perfor-mance of tree-like query types may vary depending on the complexity of such acombination and the eﬃciency of DMSs’ query optimizer. In our experiments,Virtuoso (row- and column-store) showed better performance for executing thisquery types with lower selectivity. We speculate that Virtuoso’s vectorized queryexecution model and its secondary indexing strategies (aka, compound indexes)along with its well-engineered query optimization engine may explain the betterperformance as compared to others. In addition, column-store Virtuoso usuallystores indexes more compactly. Therefore, it can store and index short, ﬁxed-length identiﬁers rather than string literals of subject, predicate, and objectvalues. This compactness typically contributes to faster index selection in queryplanning and has a positive performance impact on tree-like join queries.

Comparative Analysis of Knowledge Graph Query Performance 13

Optional Joins.

There is no evidence that any of the four DMSs has im-plemented specialized optimizations for optional joins. As a result, althoughMongoDB showed better performance for running high-selective queries, we donot observe signiﬁcant diﬀerences in query performance among the DMSs forlow-selective queries. We note that both row- and column-store Virtuoso comewith a compression strategy for storing KG datasets. Furthermore, bitmap in-dexing provides row- and column-store Virtuoso with better space utilizationas compared to B + -tree of Blazegraph (or MongoDB). In this regard, we spec-ulate Virtuoso is likely to aggressively prune intermediate results and performfaster than others for optional join query processing, especially for low-selectiveoptional join queries. Scale eﬀects.

FishMark, BioBench-Allie, and BowlognaBench are ﬁxed-sizedatasets that cannot be scaled. In contrast, WatDiv and BSBM are scalable. Inthis paper, we reported the corresponding query execution times of these twodatasets with 100M triples. However, corresponding results for datasets with10M and 1000M are computed and are available online. Our results showedthat in most cases, selectivity and query type along with query optimizationand caching techniques are probably more signiﬁcant contributory factors to theperformance diﬀerences across employed DMSs as compared to the size of theunderlying dataset (i.e., the scale factor).

Our results indicate that no single DMS displays superior query performanceacross diﬀerent query types. These results are likely to be generalizable. However,more experimentation is warranted before we can arrive at any ﬁrm conclusions.In our experiments, we had four archetypal query types. However, there may beother query types that we need to consider in the future.Currently, the maximum size of each JSON document in MongoDB is 16MB.It rejects JSON documents when its size exceeds this value. Technically, themaximum document size in document-stores helps ensure that a single documentcannot use an excessive amount of memory, but the JSON-LD representation ofKGs might be aﬀected negatively by this. In our experiments, there were nocases in which the document size exceeded the maximum value. However, inprinciple, the size of JSON documents may exceed the maximum document sizedepending on the KG content.Another issue that remains to be addressed is the automatic conversion ofSPARQL to JavaScript-like (i.e., for MongoDB) queries. In our experiments, weconverted the benchmark queries manually and after the execution of each query,we carefully checked to ensure that the output results are correct and exactlythe same across diﬀerent DMSs and representations. https://github.com/oursubmission/ESWC We note that the performance of diﬀerent query typestends to be negatively aﬀected by the sizes of the query’s output and moreoften its intermediary results. When a query type contains more than a triplepattern, DMSs usually have to scan large parts of indexes for each triple patternand then join the result of these scans. These index scans would produce largeintermediary results. We observed that even when the query itself is very selectivewith small output, the size of the intermediary results can be still very large.The size of the intermediary result challenges DMSs. Currently, DMSs usuallyuse either of two techniques data compression or Sideways Information Passing(SIP) to decrease the size of intermediary results. It appears that employingthese techniques to decrease intermediary results may increase the computationneed of the query evaluation process for the uncompression or additional ﬁltering(for SIP) requirement.

Locality.

Column-store Virtuoso and MongoDB are designed to increasedata locality while storing KGs’ content more than others. In the column-storeVirtuoso storage model, each column of a table or index is stored contiguouslyto provide physical adjacency. Therefore, when queries (e.g., tree-like joins) needto access a subset of columns from one table, only those columns actually beingaccessed need to be read from disk which can be culminated to better use of I/Othroughput and memory. This locality has this potential to reduce the traﬃcbetween CPU cache and main memory and provide a better CPU utilization.MongoDB similarly takes advantage of data locality since all the triples relatedto one resource (i.e., a subject in the JSON-LD) are physically located together.We speculate that such locality leads to denser data layout, more CPU cache(i.e., L2 cache) locality and more RAM locality and therefore increased overallperformance on high-selective KG queries.

Cache Eﬃciency.

DMSs usually utilize their internal and the underlyingﬁlesystem cache memory. When enough free memory available and allocated toDMSs, eﬃcient utilization of this memory for caching purposes can typicallycontribute to faster warm-run query execution. Comparing the results of diﬀer-ent queries across the DMSs in cold- and warm-run query execution suggestedthat column-store Virtuoso provides better cache management. In applicationswith ad-hoc queries, the cache management may not impact the performancesigniﬁcantly, but for cases in which a number of queries are repeated period-ically, employing suitable cache techniques can positively contribute to queryperformance.

Early approaches employed relational database systems to store the SemanticWeb datasets. In addition, several approaches have exploited NoSQL databasesfor building DMSs as discussed in [19]. We can classify these studies into threecategories:

Triple-based Indexing.

HexaStore [17] is a well-known DMS basedon indexing. This created indexes on all permutations of the triple pattern. The

Comparative Analysis of Knowledge Graph Query Performance 15 eﬀectiveness of triple-based indexing solutions can be limited since queryingKGs typically requires touching a large amount of data and complex ﬁltering.

Infrastructure Conﬁguring.

JenaHBase [11] and H2RDF [13] are well-knownDMSs that focused mainly on the conﬁgurations of underlying infrastructuresuch as cluster segmentation, communication overhead, and distributed storagelayouts.

Graph Processing.

Graph-based stores usually model Semantic Webdata as a labeled and directed multi-edge graph by using a disk-based adjacencylist table and executes queries by mapping them to a sub-graph matching taskover the graph.In addition to the design of the DMSs, analysis of available DMSs usingbenchmark datasets has been a core topic of Semantic Web data managementresearch. For example, some studies such as [5,2,9] presented new benchmarkdatasets. Some other studies such as [6] did not propose any new dataset, buttried to use available benchmarks and DMSs for the same purpose such as re-porting key advantages and drawbacks of each DMS. There are also studies suchas [15] which comprehensively surveyed and analyzed available datasets in termsof diﬀerent metrics such as the number of projection variables, the number ofBGPs, etc. However, to the best of our knowledge, our paper is one of the ﬁrstthat investigated the comparative KG query performance by mapping archetypalSPARQL query types with diﬀerent DMS types.

We have focused on the mapping of diﬀerent types of KG queries onto diﬀerenttypes of DMS. We analyzed the performance of row-store Virtuoso, column-store Virtuoso, Blazegraph (i.e., graph-store), and MongoDB (i.e., document-store) using ﬁve well-known benchmarks, namely, BSBM, WatDiv, FishMark,BowlognaBench, and BioBench-Allie. A summary of our ﬁndings is as follows: – There are signiﬁcant interaction eﬀects between diﬀerent types of DMSs andquery types. – Our results showed that the simplicity of the underlying storage layout, in-creasing data locality, and suitable caching techniques in

Virtuoso (speciallycolumn-store) lead to a performance advantage for tree-like join queries bygenerating smaller intermediary results . – We also found that suitable cardinality estimation as well as eﬃcient queryoptimization of

Blazegraph oﬀers a signiﬁcant performance improvement onthe cold-run execution of subject-object join queries. – Taking advantage of data locality and employing eﬃcient data structuressuch as B-trees for implementing indexes in MongoDB can contribute to overone order of magnitude better performance for executing subject-subject joinqueries, especially for queries with high selectivity.The results presented in this paper can assist in the benchmarking of theemerging type of DMSs. However, more experimentation is warranted before wecan arrive at any ﬁrm conclusions. In addition, our experience while performing a comparative analysis of KG query performance raised several new and inter-esting questions and research directions that need to be addressed in the future.These include replication of this research using more datasets and DMSs andautomatic rewriting of SPARQL queries to other declarative query languagessuch as MongoDB’s query language.

References

1. G. Aluç, M. T. Özsu, and K. Daudjee. Building self-clustering rdf databases usingtunable-lsh.

The VLDB Journal , 28(2):173–195, 2019.2. G. Aluç et. al. Diversiﬁed stress testing of RDF data management systems. In

Proc. of the Int. Semantic Web Conf. (ISWC) , pages 197–212, 2014.3. M. Atre. Left bit right: For SPARQL join queries with OPTIONAL patterns (left-outer-joins). In

Proc. of the ACM Int. Conf. on Management of Data (SIGMOD) ,pages 1793–1808, 2015.4. S. Bail et. al. FishMark: A linked data application benchmark. In

Proc. of theInt. Workshop on Scalable and High-Performance Semantic Web Systems (SSWS) ,pages 1–15, 2012.5. C. Bizer et. al. The Berlin SPARQL benchmark.

Int. J. Semantic Web Inf. Syst. ,5:1–24, 2009.6. P. Cudré-Mauroux et. al. NoSQL databases for RDF: An empirical evaluation. In

Proc. of the Int. Semantic Web Conf. (ISWC) , pages 310–325, 2013.7. G. Demartini et. al. Bowlognabench—benchmarking rdf analytics. In

Proc. of ofthe Int. Symp. on Data-Driven Process Discovery and Analysis (SIMPDA) , pages82–102, 2012.8. S. Duan et. al. Apples and oranges: A comparison of RDF benchmarks and realRDF datasets. In

Proc. of the ACM Int. Conf. on Management of Data (SIGMOD) ,pages 145–156, 2011.9. O. Erling et. al. The LDBC social network benchmark: Interactive workload. In

Proc. of the ACM Int. Conf. on Management of Data (SIGMOD) , pages 619–630,2015.10. S. Groppe.

Data management and query processing in Semantic Web databases .Springer Science & Business Media, 2011.11. V. Khadilkar et. al. Jena-hbase: A distributed, scalable and eﬃcient RDF triplestore. In

Proc. of the Int. Semantic Web Conf. (ISWC) , pages 85–88, 2012.12. T. Neumann and G. Weikum. The RDF-3X engine for scalable management ofRDF data.

Proc. VLDB Endow. , 19(1):91–113, 2010.13. N. Papailiou, I. Konstantinou, D. Tsoumakos, and N. Koziris. H2RDF: Adaptivequery processing on RDF data in the cloud. In

Proc. of the Int. Conf. on WorldWide Web (WWW) , pages 397–400, 2012.14. H. Paulheim. Knowledge graph reﬁnement: A survey of approaches and evaluationmethods.

Semantic Web , 8(3):489–508, 2017.15. M. Saleem et. al. How representative is a SPARQL benchmark? an analysis of RDFtriplestore benchmarks. In

Proc. of the Int. Conf. on World Wide Web (WWW) ,pages 1623–1633, 2019.16. M. Stocker et. al. Sparql basic graph pattern optimization using selectivity esti-mation. In

Proc. of the Int. Conf. on World Wide Web (WWW) , pages 595–604,2008. Comparative Analysis of Knowledge Graph Query Performance 1717. C. Weiss et. al. Hexastore: Sextuple indexing for Semantic Web data management.

Proc. VLDB Endow. , 1(1):1008–1019, 2008.18. H. Wu et. al. BioBenchmark Toyama 2012: an evaluation of the performance oftriple stores on biological data.

Journal of Biomedical Semantics , 5(1):32–43, 2014.19. M. Wylot et. al. RDF data storage and query processing schemes: A survey.