[PDF] An Algorithm for Context-Free Path Queries over Graph Databases

Abstract

RDF (Resource Description Framework) is a standard language to represent graph databases. Query languages for RDF databases usually include primitives to support path queries, linking pairs of vertices of the graph that are connected by a path of labels belonging to a given language. Languages such as SPARQL include support for paths defined by regular languages (by means of Regular Expressions). A context-free path query is a path query whose language can be defined by a context-free grammar. Context-free path queries can be used to implement queries such as the "same generation queries", that are not expressible by Regular Expressions. In this paper, we present a novel algorithm for context-free path query processing. We prove the correctness of our approach and show its run-time and memory complexity. We show the viability of our approach by means of a prototype implemented in Go. We run our prototype using the same cases of study as proposed in recent works, comparing our results with another, recently published algorithm. The experiments include both synthetic and real RDF databases. Our algorithm can be seen as a step forward, towards the implementation of more expressive query languages.

Full PDF

AAn Algorithm for Context-Free Path Queries overGraph Databases

Ciro M. MedeirosMartin A. MusicanteUmberto S. Costa [email protected] { mam,umberto } @dimap.ufrn.brFederal University of Rio Grande do NorteCampus Universit´ario Lagoa NovaNatal, RN, Brazil 1524 ABSTRACT

RDF (Resource Description Framework) is a standard lan-guage to represent graph databases. Query languages forRDF databases usually include primitives to support pathqueries, linking pairs of vertices of the graph that are con-nected by a path of labels belonging to a given language.Languages such as SPARQL include support for paths de-fined by regular languages (by means of Regular Expressions).A context-free path query is a path query whose languagecan be defined by a context-free grammar. Context-free pathqueries can be used to implement queries such as the “samegeneration queries”, that are not expressible by Regular Ex-pressions. In this paper, we present a novel algorithm forcontext-free path query processing. We prove the correct-ness of our approach and show its run-time and memorycomplexity. We show the viability of our approach by meansof a prototype implemented in Go. We run our prototypeusing the same cases of study as proposed in recent works,comparing our results with another, recently published al-gorithm. The experiments include both synthetic and realRDF databases. Our algorithm can be seen as a step for-ward, towards the implementation of more expressive querylanguages.

Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies are notmade or distributed for profit or commercial advantage and that copies bearthis notice and the full citation on the first page. Copyrights for componentsof this work owned by others than ACM must be honored. Abstracting withcredit is permitted. To copy otherwise, or republish, to post on servers or toredistribute to lists, requires prior specific permission and/or a fee. Requestpermissions from [email protected].

CCS CONCEPTS • Information systems → Database query processing;

Resource Description Framework (RDF); • Theory of compu-tation → Grammars and context-free languages; Design andanalysis of algorithms;

KEYWORDS graph path queries, context-free grammars, RDF

ACM Reference format:

Ciro M. Medeiros, Martin A. Musicante, and Umberto S. Costa. 2018.An Algorithm for Context-Free Path Queries over Graph Databases.In

Proceedings of SIGMOD ’20: ACM SIGMOD/PODS InternationalConference on Management of Data, Portland, OR, June 14–19, 2020(SIGMOD ’20),

13 pages.DOI: 10.1145/1122445.1122456

Processing a Path Query over a Graph Database consists oflooking for pairs of vertices such that they are connected bya specified path inside the graph. The labels of the edges ina path form a string and, as such, they can be specified byusing grammars or other formal tools. Regular Expressionshave been widely used to define path queries. As regularlanguages belong to the most restricted class of formal lan-guages, the expressivity of such queries is somehow limited.Recent studies have developed algorithms for supportingthe use of context-free grammars in path queries in order toimprove their expressiveness.RDF (Resource Description Framework) is the Linked Datastandard for representing data. An RDF database consists ona set of triples that can be viewed as a graph. The standardquery language for RDF databases is SPARQL. The languagesupports the definition of paths using regular expressionsover labels of edges in the graph. However, some applicationsrequire more sophisticated queries, which cannot be definedusing regular expressions, but may be described by context-free grammars. a r X i v : . [ c s . D B ] A p r IGMOD ’20, June 14–19, 2020, Portland, OR Ciro M. Medeiros, Martin A. Musicante, and Umberto S. Costa

In the last few years, a number of initiatives were devel-oped to improve the expressiveness of SPARQL and pathquery languages in general. most of these initiatives includede definition of algorithms for the evaluation of context-freepath queries. Such algorithms are, in general, based on pars-ing techniques. In this paper we present a new approachthat, while it is not based on a specific parsing technique, ituses annotations over grammar items to parse several pathsat the same time, keeping track of shared prefixes over thesepaths.Our main contributions are: • an algorithm for evaluation of context-free path queries; • an analysis of correctness, as well as time and spacecomplexity for the algorithm; • experimental results that demonstrate its applicabil-ity in different scenarios. This section briefly presents some basic background that isused in the paper.

Definition 2.1 (Grammar). A context-free grammar is aquadruple G = ( N , Σ , P , S ) where N is the set of non-terminalsymbols, Σ is the set of terminal symbols (alphabet), P is theset of production rules in the form A → α , for A ∈ N and α ∈ ( N ∪ Σ ) ∗ , and S ∈ N is the start symbol.We are interested in querying graph databases, repre-sented using RDF. An RDF graph is made of resources andthe relationships between them. A resource may be in oneof the following pairwise disjoint sets: • Internationalized Resource Identifiers (IRIs), whichare an extension of Uniform Resource Identifiers(URIs) with support to a wider range of Unicodecharacters. IRIs uniquely identify resources suchas documents, movies or users’ profiles in socialnetworks; • literals, which specify a literal value such as a text,number or date; or • blank nodes, which are equivalent to labeled nullvalues.The relationships between resources are expressed in theform of triples . A triple is denoted by ( s , p , o ) , where s is the subject , p is the predicate and o is the object . The subject ofa triple is either an IRI or a blank node; the predicate (alsoknown as the property ) is an IRI; and the object is either IRI,a literal or a blank node. A finite set of triples forms an RDFdatabase, which corresponds to a graph. Definition 2.2 (Graph). A graph is a set of triples in V × E × V ,where V is a set of vertices and E is a set of edge labels. InRDF, it is possible that V ∩ E (cid:44) { } . We can specify paths inside a graph by adequately choos-ing a sequence of triples. Definition 2.3 (Path and Trace). A path is a sequence oftriples ( t , t , .. t k ) from a given graph, where t i = ( s i , p i , o i ) ,such that o i = s i + . The trace of a path is the string formedby the concatenation of the edge labels p from its triples.The set of paths between two vertices x and y is denoted by paths ( x , y ) . Notice that this includes the empty path betweenone node and itself. Given a set of paths Π ⊆ paths ( x , y ) , theset of traces defined by these paths is denoted as traces ( Π ) . Definition 2.4 (Context-Free Path Query).

Given a datagraph D and a context-free grammar G , a context-free pathquery Q is a set of query pairs ( x , A ) where x is a vertex of thegraph and A a non-terminal symbol from a given grammar.The evaluation of a context-free path query Q produces theset of all vertexes y such that there exists a path from x to y whose trace s is derivable by A . Eval ( Q ) = { y | ∃ s . A ⇒ ∗ s ∧ s ∈ traces ( paths ( x , y ))} The next definition establishes the set of vertices that arereachable from a given vertex, by following a path repre-sented by a string of (terminal and non-terminal) symbols ofa grammar.

Definition 3.1 ( G -Reachable vertices). Let G = ( N , Σ , P , S ) be a grammar, and D ⊆ V × E × V be a data graph. Givena vertex x ∈ V and a string α ⊆ ( Σ ∪ N ) ∗ , the function (cid:123) G , D ( x , α ) defines the set of vertices reachable from x byfollowing an α -derivable path in D : (cid:123) G , D ( x , α ) : V × ( Σ ∪ N ∪ { ϵ }) ∗ (cid:55)→ P ( V ) . This function is recursively defined on α , as follows:(1) For α = ε (the empty string), each vertex is reachablefrom itself: (cid:123) G , D ( x , ε ) = { x } .(2) For α = p ∈ Σ , the set of vertices reachable from x via a p -labeled edge is (cid:123) G , D ( x , p ) = { y | ( x , p , y ) ∈ D } .(3) If α = A ∈ N , the set of vertices reachable from x isdefined by using the right-hand side of the produc-tions of A in G : (cid:123) G , D ( x , A ) = (cid:216) A → α ∈ P (cid:123) G , D ( x , α ) . (4) If α = α α , the set of vertices reachable from x isdefined as: (cid:123) G , D ( x , α α ) = (cid:216) w ∈ (cid:123) G , D ( x , α ) (cid:123) G , D ( w , α ) . n Algorithm for Context-Free Path Queries over Graph Databases SIGMOD ’20, June 14–19, 2020, Portland, OR It is easy to verify that this function is associative, since stringconcatenation and set union are both associative operations.The following property establishes that for any vertex y , G -reachable from x , there exists a path in the graph whoselabels form a string generated by the grammar G .Proposition 3.2 (Derivation of traces for paths inthe graph). Given a grammar G = ( N , Σ , P , S ) , a data graph D ⊆ V × E , two nodes x , y ∈ V and a string α ⊆ ( Σ ∪ N ) ∗ , wehave that y is in (cid:123) G , D ( x , α ) if and only if there is a α -derivablepath in D from x to y : ∀ x , y , α . ( y ∈ (cid:123) G , D ( x , α ) ⇐⇒ ∃ s . α ⇒ ∗ s ∧ s ∈ traces ( paths ( x , y ))) Proof. Assuming s = p ... p m , we proceed by inductionon two variables, m and n , representing respectively thelength of the string s and the number of steps in the deriva-tion α ⇒ n p ... p m . • Base case (with n = , m = α ⇒ ϵ and s = α = ϵ . By Def-inition 3.1.1, we also know that y = x since y ∈ (cid:123) G , D ( x , ϵ ) = { x } . We need to show that ∀ x . ( x ∈ (cid:123) G , D ( x , ϵ ) ⇐⇒ ϵ ∈ traces ( paths ( x , x ))) This is straightforward since ϵ ∈ traces ( paths ( x , x )) .Notice that when m =

0, we have to build deriva-tions from the empty string. So, m = = ⇒ n = • Inductive step on m (with n = s = α , so we must prove that ∀ x , y , s . ( y ∈ (cid:123) G , D ( x , s ) ⇐⇒ s ∈ traces ( paths ( x , y ))) This follows by mathematical induction on m . • Inductive step on n (with m > ∀ x , y , α . ( y ∈ (cid:123) G , D ( x , α )⇐⇒ ∃ p ... p m . α ⇒ n p ... p m ∧ p ... p m ∈ traces ( paths ( x , y ))) for an arbitrary n .Since n >

0, we have that α = α A α , where A ∈ N and α , α ∈ ( N ∪ Σ ) ∗ . By Induction Hypoth-esis, we have that there exist vertices v , w ∈ V andindexes k , j where 0 ≤ k ≤ j ≤ m such that: a a ba b Figure 1: Example Graph. v ∈ (cid:123) G , D ( x , α ) ⇐⇒ α ⇒ ∗ p ... p k ∧ p ... p k ∈ traces ( paths ( x , v )) w ∈ (cid:123) G , D ( v , A ) ⇐⇒ A ⇒ ∗ p k + ... p j ∧ p k + ... p j ∈ traces ( paths ( v , w )) y ∈ (cid:123) G , D ( w , α ) ⇐⇒ α ⇒ ∗ p j + ... p m ∧ p j + ... p m ∈ traces ( paths ( w , y )) These hypotheses, together with Definition 3.1.4 al-low us to conclude the proof. (cid:3)

In this section we present our proposal for the evaluation ofCFPQs. Our algorithm receives a grammar, a data graph anda query, and follows context-free paths inside the data graph.The goal of the algorithm is to identify pairs of vertices linkedby paths whose traces are strings generated by the grammar.The following example illustrates the problem:

Example 3.3.

Let us consider a grammar G with the fol-lowing production rules: S → a S b S → ε and the data graph given in Figure 1.Given the query Q = {( , S ) , ( , S )} , our algorithm goesthrough paths starting at vertices 1 and 3 whose trace isgenerated by S . In this way all the production rules of S willbe investigated for paths starting at each of these vertices.For the query Q , our algorithm will compute the sets ofvertices { , , } , reachable from node 1, and the set { , } ,reachable from node 3. (cid:5) Our method relies on two assumptions: (i) there may beseveral paths starting at a given node of the data graph; and (ii) for each of these paths, their trace may be derivable froma non-terminal of the grammar.Our algorithm explores these two properties to parse allthe paths from a given vertex, in order to discover whichof them have traces derivable by a given non-terminal. Theparsing of all these traces is performed in an incremental

IGMOD ’20, June 14–19, 2020, Portland, OR Ciro M. Medeiros, Martin A. Musicante, and Umberto S. Costa way. In our setting, a query Q is represented by a set ofpairs ( v , A ) , where v is a vertex of the data graph and A isa non-terminal symbol of the grammar. For each pair ( v , A ) of the query, our algorithm identifies all the paths from v whose traces are strings derivable from A .In a traditional parsing setting, we may use the notionof grammar item to guide the parsing process. Grammaritems use a dot on the right-hand side of a production ruleto mark the progress of the parsing. Traditional parsingtechniques are tailored to process one input string at a time.The information carried by the dot is related just to theprogress of the parsing. In our case, we also need to identifythe strings that form paths of the graph being parsed. Thus,we associate vertices of the graph to the positions of theparsing process. In our case, we will use sets of vertices ofthe graph within the items, in the place where the dot mayappear. The next definition captures this idea: Definition 3.4 (Trace Item).

Given a context-free grammar G = ( N , Σ , P , S ) and a data graph D ⊆ V × E × V , a TraceItem is a pair formed by a production rule and a functionassociating a set of graph nodes to each position of the right-hand side of the rule. Formally, a trace item is defined as thepair ( A → α , f ) , where A → α ∈ P and f : { , . . . , | α |} → P ( V ) .The trace item ( A → α , . . . , α n , f ) , where f = { (cid:55)→ C , . . . , n (cid:55)→ C n } will be noted as [ A → C α C ... α n C n ] .The sets C , . . . , C n will be called position sets . (cid:5) In general, given position sets C , C and a grammar sym-bol α , a sequence C α C in the right-hand side of an itemindicates that each vertex in C will be reached by an α -derivable path beginning at a vertex in C . For instance, thetrace item [ S → { } a { , } S { } b { } ] in Example 3.3, indi-cates that the parsing process is in a stage where a -derivablepaths linking vertex 1 to vertices 2 and 3 in the data graphhave been identified.Next, we present the intuitive idea of our algorithm. Inorder to solve a query Q , our algorithm will start processingtrace items obtained from the query pairs and rules of thegrammar: for each query pair ( v , A ) ∈ Q , we create one traceitem for each production rule of A with v in its first positionset. We will use special marks ◦ and • for unprocessed andprocessed vertices inside position sets, respectively, in orderto keep track of what vertices have already been processed .Our algorithm will process trace items until there are nounprocessed vertices belonging to any position set.The next example shows how to compute the answers forthe given query, graph and grammar. We omit the • and ◦ marks from vertices in position sets when suchdistinction is unnecessary. Example 3.5.

Given the query Q = {( , S ) , ( , S )} and datagraph D and grammar G from Example 3.3, we start theparsing process by creating trace items. For each query pair ( v , A ) ∈ Q , we create one trace item for each production ruleof A with v in its first position set. For the query Q we buildthe trace items: [ S → { ◦ } a { } S { } b { } ] (1) [ S → { ◦ } ] (2) [ S → { ◦ } a { } S { } b { } ] (3) [ S → { ◦ } ] (4)Our algorithm picks the unprocessed vertices in an arbi-trary order. Let us start with vertex 1 from trace item (1). Thisvertex appears in a position set before the terminal symbol a .We must walk from vertex 1 to all its neighbors linked by an a -labeled edge in D . The neighbors vertices 2 and 3 must thenbe added to the next position set in the trace item. Doing so,our item will become [ S → { • } a { ◦ , ◦ } S { } b { } ] . No-tice that vertex 1 ◦ has changed to 1 • to signal that this vertexhas been processed. New vertices are added as unprocessedby using the mark ◦ . Now we may pick vertex 2 for the nextstep. This vertex is in a position set before the non-terminalsymbol S . That indicates that we have to look for S -derivablepaths starting at vertex 2. We build the following new items: [ S → { ◦ } a { } S { } b { } ] (5) [ S → { ◦ } ] (6)Now item (1) becomes [ S → { • } a { • , ◦ } S { } b { } ] and we have to pick another vertex to process. Picking vertex2 from item (5) we verify that there is no a -labeled edge goingfrom vertex 2 to any other vertex in the graph. That meansthat there is no a -derivable path from this vertex. Item (5)then becomes [ S → { • } a {} S {} b {} ] .Let us now pick vertex 2 from item (6). This item wasbuilt from an ϵ -rule. As the vertex 2 belongs to the firstand last position set of this item, that means that there is a S -derivable path from vertex 2 to itself (the empty path). So,we augment the data graph with an S -labelled edge (shownin boldface ): a a b ba S Now, item (6) becomes [ S → { • } ] . The addition of thenew, S -labelled edge to the data graph triggers a modificationto the existing items: we add the unprocessed vertex 2 toany position set C appearing in a trace item matching the n Algorithm for Context-Free Path Queries over Graph Databases SIGMOD ’20, June 14–19, 2020, Portland, OR pattern [ . . . { , . . . } S C . . . ] . In our case, item (1) becomes [ S → { • } a { • , ◦ } S { ◦ } b { } ] .We may now pick the newly added vertex 2 ◦ in item (1).Now we have a vertex in a position set before the termi-nal b . As we did before, we look for b -labeled edges go-ing out from 2 in the data graph. There is only one suchedge, which arrives at vertex 3. Item (1) then becomes [ S → { • } a { • , ◦ } S { • } b { ◦ } ] .Now we pick the newly added vertex 3 in the last po-sition set of item (1). As this vertex is at the last posi-tion set of the item, we infer that there is an S -valid pathfrom vertex 1 to vertex 3. As ( , S ) ∈ Q , we have foundone answer for our query. Item 1 then becomes [ S →{ • } a { • , ◦ } S { • } b { • } ] . Then, the data graph is aug-mented with a new S -labelled edge from 1 to 3: a a , S b ba S

This process is repeated until there are no more unpro-cessed vertices. The complete step-to-step process is pre-sented in Table 1. That will result in the following set ofitems: [ S → { • } a { • , • } S { • , • , • } b { • , • } ] , [ S → { • } ] , [ S → { • } a { } S { } b { } ] , [ S → { • } ] , [ S → { • } a { • } S { • , • , • } b { • } ] , [ S → { • } ] The solutions computed by our algorithm are shown as bold arrows, labeled by non-terminals, in Figure 2. (cid:5) a a , S b b , Sa SS SS

Figure 2: Result graph for the query of Example 3.3.

Let us now present our algorithm for processing context-free path queries (Algorithm 1). Our technique is based onthe idea of building and updating a set of trace items. Theinput parameters of the algorithm are:(1) A context-free grammar G = ( N , Σ , P , S ) , defined bythe user.(2) An RDF graph D = V × Σ × V with edges restrictedto the grammar alphabet. [ S → { ◦ } a { } S { } b { } ] , [ S → { ◦ } ] , [ S → { ◦ } a { } S { } b { } ] , [ S → { ◦ } ] [ S → { • } a { ◦ , ◦ } S { } b { } ] [ S → { • } a { • , ◦ } S { } b { } ] , [ S → { ◦ } a { } S { } b { } ] , [ S → { ◦ } ] [ S → { • } a { } S { } b { } ] [ S → { • } ] , [ S → { • } a { • , ◦ } S { ◦ } b { } ] [ S → { • } a { • , ◦ } S { • } b { ◦ } ] [ S → { • } a { • , ◦ } S { • } b { • } ] [ S → { • } ] [ S → { • } ][ S → { • } a { • , ◦ } S { • , ◦ } b { • } ]

10 line 8 [ S → { • } a { • , ◦ } S { • , • } b { • , ◦ } ]

11 lines 12, 14 [ S → { • } a { • , ◦ } S { • , • } b { • , • } ]

12 line 8 [ S → { • } a { ◦ } S { } b { } ]

13 line 8 [ S → { • } a { • } S { ◦ , ◦ , ◦ } b { } ]

14 line 8 [ S → { • } a { • } S { ◦ , ◦ , • } b { } ]

15 line 8 [ S → { • } a { • } S { ◦ , • , • } b { ◦ } ]

16 lines 12, 14 [ S → { • } a { • } S { ◦ , • , • } b { • } ]

17 line 8 [ S → { • } a { • } S { • , • , • } b { • } ]

18 line 10 [ S → { • } a { • , • } S { • , • , ◦ } b { • , • } ]

19 line 8 [ S → { • } a { • , • } S { • , • , • } b { • , • } ] Table 1: Step-by-step behavior of Algorithm 1. (3) A set of query pairs Q ⊆ V × N . Each pair of thequery set indicates a start vertex and non-terminalsymbol used for recognizing paths.Our algorithm uses the ∪ (cid:90) operator to perform unionsbetween sets of marked and unmarked vertices. This operatoris defined as follows: given the position sets C and { x ◦ } , theunion between them is defined as: C ∪ (cid:90) { x ◦ } = (cid:26) C , if x • ∈ CC ∪ { x ◦ } , otherwise That is, if the vertex x has already been processed, it is keptas processed in the position set. Otherwise, it is added asunprocessed.The following data structures are manipulated during thealgorithm’s execution: I : A set of trace items, iterativelly computed by thealgorithm. IGMOD ’20, June 14–19, 2020, Portland, OR Ciro M. Medeiros, Martin A. Musicante, and Umberto S. Costa

ALGORITHM 1:

The Trace Item-based Algorithm

Input: G = ( N , Σ , P , S ) , Q ⊆ V × N , D ⊆ V × Σ × V Output: D (cid:48) ⊆ V × Σ × V function eval I : = {[ A → { w ◦ } α { } ... α n { }] | A → α ... α n ∈ P ∧ ( w , A ) ∈ Q } D (cid:48) : = D while ∃ i , x s . t . i = [ A → ... { x ◦ , ... } ... ] ∈ I do switch i case i = [ A → ... { x ◦ , ... } α k C k ... ] do if α k ∈ Σ ∨ [ α k → { x } . . . ] ∈ I then C k : = C k ∪ (cid:90) { y ◦ | ( x , α k , y ) ∈ D (cid:48) } else I : = I ∪ {[ α k →{ x ◦ } β { } ... β n { }] | α k → β ... β n ∈ P } case i = [ A → { w } . . . { x ◦ , ... }] do D (cid:48) : = D (cid:48) ∪ {( w , A , x )} foreach [ B → . . . { w • , ... } A C . . . ] ∈ I do C : = C ∪ (cid:90) { x ◦ } mark ( x , i ) return D (cid:48) D (cid:48) : A data graph D (cid:48) , containing the original data graph D incrementally augmented with new, non-terminal-labeled edges.Lines 2-3 initialize I and D (cid:48) . For each pair ( w , A ) ∈ Q and rule A → α ... α n ∈ P , the set I is initialized with items A → { w ◦ } α { } ... α n { } . The graph D (cid:48) is initialized as a copyof the input graph D . These steps prepare the algorithmto enter the main loop that processes unmarked vertices initems of I . The main loop concludes when there are no suchunmarked vertices.The processing of unmarked vertices is divided into twocases:(1) In the first case (lines 6-10), given the trace item i = [ A → C α C ... α n C n ] , x ◦ belongs to a positionset C k − that is not the last position set of the item.(a) If α k ∈ Σ , we add to C k all the vertices y ◦ suchthat there exists an edge ( x , α k , y ) ∈ D (cid:48) (line 3).(b) If α k ∈ N and [ α k → { w } . . . ] ∈ I , we add all y ◦ to C k such that there is an edge ( x , α k , y ) ∈ D (cid:48) (this case is also treated by line 3).(c) If α k ∈ N and there is no trace item [ α k →{ x ... } ... ] , our algorithm initiates the searchfor α k -derivations beginning at x . This is doneby creating new trace items α k → { x ◦ } . . . andadding them to I (line 10).(2) In the second case of the main loop, lines 11 to 14,we identify that the vertex x belongs to the last position set of a trace item. The item i = [ A →{ w } . . . { x ◦ , ... }] states that we have walked a pathfrom the vertex w to x in the data graph D (cid:48) . So, ouralgorithm generates a new A -labeled edge connect-ing these two vertices (line 12). After this operation,we must update with x ◦ all position sets C such that [ B → . . . { w , . . . } A C . . . ] ∈ I (line 14).The vertex x ◦ from the generalized item i is marked asvisited at the end of the loop body (line 15). When thereare no more unmarked vertices, the main loop stops and thedecorated graph D (cid:48) is returned (line 16).In the next sections we analyze the behaviour of our al-gorithm in terms of correctness and runtime and memorycomplexity. In this section, we show the correctness of our algorithm.Proposition 3.6.

Let G = ( N , Σ , P , S ) be a grammar, D ⊆ V × E × V a data graph and a query pair ( w , A ) ∈ Q . Given [ A → { w } α C ... α j C j ... ] ∈ I computed by Algorithm 1,then for any vertex x ∈ V we have x ∈ C j ⇐⇒ x ∈ (cid:123) G , D ( w , α . . . α j ) . Sketch. We analyze the behaviour of the algorithm atthe lines that change the set I of trace items: (line 2) The set I is initialized to contain the item [ A →{ w ◦ } α { } ... α n { } ] , for each rule A → α ... α n ∈ P . From this construction we can see that for j = w = x , C = { x } = { w } and α ... α j = ϵ . In this case, it is evident that w ∈ C ⇐⇒ w ∈ (cid:123) G , D ( w , ϵ ) . (line 10) At this line, new trace items are added intothe set I for each rule α k → β ... β n . The creation ofnew items is in under the same conditions presentedat line 2. Again j =

0, so we have w = x , C = { x } = { w } and β ... β j = ϵ . In this case, we have w ∈ C ⇐⇒ w ∈ (cid:123) G , D ( w , ϵ ) . (line 8) A position set C in I is incremented with newvertices y such that ( x , α k , y ) ∈ D (cid:48) . We can distin-guish two cases:- If α k is a terminal symbol, we add to C k all vertices y such that exists a α k -labeled edge from x to y in D (cid:48) : y ∈ C k ⇐⇒ y ∈ (cid:123) G , D ( x , α k ) . This condition holds by Definition 3.1.2.- If α k ∈ N we need to add to C k all the vertices y such that there is an edge labelled ( x , α k , y ) in D (cid:48) .Notice that this edge was the result of a previous n Algorithm for Context-Free Path Queries over Graph Databases SIGMOD ’20, June 14–19, 2020, Portland, OR processing, meaning that the algorithm has alreadydiscovered a path from x to y such that its tracecorresponds to the right-hand side of a productionrule of α k . Thus, y ∈ C k ⇐⇒ y ∈ (cid:123) G , D ( x , α k ) . This condition holds by Definition 3.1.3. (line 14)

We deal with those vertices x appearing atthe last position set of a trace item [ A → { w • } ... { x ◦ , ... } ] built from a production rule A → γ . Items with thisconfiguration indicate the existence of a path from w to x in D (cid:48) such that its trace is the string γ . Ouralgorithm adds a new A -labeled edge from w to x (line 12), thus using the production rule. Thus, forevery item i = [ B → ... { w • , ... } A C j ... ] built froma production rule B → γ A γ , we can verify that: x ∈ C j ⇐⇒ x ∈ (cid:123) G , D ( w , A ) . This condition holds by Definitions 3.1.3 and 3.1.4. (cid:3)

We start by presenting evidences that the proposed algo-rithm is correct.The result graph D (cid:48) is only updated at line 3, where itjust copies the input graph D , and at line 12, where it isincreased with a new edge ( w , A , x ) where w , A and x comefrom the generalized item i = A → { w • } . . . { x • , . . . } . ByDefinition 3.1.2 we can conclude that line 3 is a valid step;however, for line 12 it depends on whether the generalizeditems i ∈ I were constructed correctly.Proposition 3.7. Algorithm 1 computes D (cid:48) such that forall ( x , A ) ∈ Q ∀ y . ( y ∈ (cid:123) G , D ( x , A ) ⇐⇒ ( x , A , y ) ∈ D (cid:48) ) Proof. This follows from Propositions 3.2 and 3.6. (cid:3)

In this section, we show the time and space complexity ofour algorithm. Our proof is based on the finite number ofelements in the sets it manipulates.Proposition 3.8 (Worst-case Space Complexity).

Theworst-case space complexity of Algorithm 1 is O (| V | · | P | · k ) . Proof. The maximum size that D (cid:48) and I may reach is: D (cid:48) : The algorithm increments the graph D (cid:48) with non-terminal-labeled edges, so it uses at most: | D (cid:48) | = | V | · | N ∪ Σ | · | V | (7)what is O (| V | · | N ∪ Σ |) . I : The set I contains generalized items, which are an-notated production rules with a single vertex at thestart of the right-hand side. So we have at most: | I | = | V | · | P | (8)For each trace item, the number of position set setsdepends on the size of the right-hand side of a pro-duction rule. Assuming that k denotes the greatestsize of the right-hand side of the rules in P , eachtrace item may have k position sets of size at most | V | (notice that the first position set on each traceitem is always a singleton).In this context, the worst case in space complexityfor I is: | V | · | P | · k · | V | . what is O (| V | · | P | · k ) .We can now estimate the worst-case space complexity as: O (| V | · (| N ∪ Σ | + | P | · k )) (9) (cid:3) Proposition 3.9 (Worst-case Runtime Complexity).

The worst-case runtime complexity of Algorithm 1 is O (| V | ·| P | · k ) . Proof Sketch. The main loop iterates until there are nomore unmarked vertices x ◦ . The maximum number of un-marked vertices is given by | I |· k ·| V | , where k is the maximumnumber of possible position sets for rules of the grammar(the greatest size of a right-hand side of the rules in P , plusone). So, as | I | = | V | · | P | , we have at most | V | · | P | · k possiblevertices x ◦ .For each iteration, the form of the trace item i guides theoperation to be performed. The tests at lines 6 and 11 haveconstant cost.There are two cases to be considered inside the switch command: • The evaluation of the condition at line 7 requiressearching over the set of trace items I . The cost ofthis operation is constant (supposing that we use amatrix representation).Line 8 is the case where the algorithm advancesone step on a path by looking for edges ( x , α , y ) ∈ D (cid:48) .As there are at most | V | possible destination vertexes,the algorithm performs at most | V | operations in thiscase.At line 10, the algorithm adds new trace items to I in order to start a new derivation. This line ensuresthat the algorithm only creates at most one traceitem for each production rule in P for a fixed vertex x . So, in this case, the algorithm performs at most | P | constant time operations. IGMOD ’20, June 14–19, 2020, Portland, OR Ciro M. Medeiros, Martin A. Musicante, and Umberto S. Costa

In this way, the overall cost of the case spanningfrom line 6 to 10 is bounded by max (| V | , | P |) . • The second case of the switch command adds non-terminal labelled edges to the graph. The creation ofsuch edges is performed at line 12, in constant time.The appearance of a new edge triggers the updateof position sets by the iteration at line 13. We haveat most | V | · | P | · k position sets. Assuming, again, amatrix representation, locating each set C in a traceitem, requires constant time. Thus, line 14 will beexecuted | V | · | P | · k times in the worst case.In this way, the overall cost of the case spanningfrom line 11 to 14 is bounded by | V | · | P | · k .This shows that the worst-case time complexity of ouralgorithm is O (| V | · | P | · k ) . (cid:3) Graph databases have become popular in the last few years.Specifying queries over such databases normally include property paths , which define paths on the data graph bymeans of regular expressions [12, 19]. In [1, 5, 7, 20], itis noted that there exist useful queries that cannot be ex-pressed by regular expressions, since they require some kindof bracket matching.

Same Generation Queries [1] are anexample of queries that cannot be expressed by regular ex-pressions, requiring the identification of context-free paths.Answering context-free path queries is NP-Complete [13].However, specifying the starting node of the path makes thecost of processing those queries manageable.In [7], the author proposes an algorithm to evaluate Context-Free Path Queries based on Earley’s and CYK parsing tech-niques [6]. This algorithm receives a grammar (in ChomskyNormal Form) and a data graph. The algorithm is basedon the idea of adding a non-terminal-labelled edge to linknodes that are connected by a path generated by the gram-mar. Regardless of the query, the algorithm in [7] processesthe whole graph. For any vertices x and y and non-terminalsymbol S , an S -labelled edge linking x to y is created if thereexist an S -derivable path in the graph linking x to y . Afterthat, atomic queries can be executed in constant time. Thealgorithm is O (| N || E | + (| N || V |) ) , where N is the set of non-terminal symbols of the grammar, V is the set of nodes ofthe graph and E is the set of edges.In [20], the query language cfSPARQL is proposed. Thelanguage includes queries defined by context-free grammars,as well as by nested regular expressions [14]. The evaluationmechanism of cfSPARQL is an adaptation of the algorithmin [7] and presents the same time complexity.An LL-based approach to recognize context-free pathsin RDF graphs is proposed in [5]. The proposal uses the GLL [16] parsing technique to define an algorithm for query-ing data graphs with time complexity of O (| V | max v ∈ V ( deд + ( v ))) ,where V is the set of vertices and deд + ( v ) is the outdegreeof vertex v . Notice that for complete graphs this runtimecomplexity is O (| V | ) .The Valiant’s parsing algorithm [18] is the base for thequery algorithm presented in [4]. The algorithm uses a ma-trix representation of the graph where each cell containsthe edge between two vertices, represented by line and col-umn. The proposal uses an efficient, GPU-based calculationof the transitive closure of that matrix to answer queries.Similarly to [7], the algorithm in [4] calculates all possiblenon-terminal labelled edges between nodes of the graph. Thetime complexity of this algorithm is O (| V | · | N | ) , where V is the set of vertices of the graph and N is the set of non-terminal symbols of the query’s grammar.In [15], the authors present a Context-Free Path Queryprocessing algorithm based on the well-known bottom-upLR parsing technique [2]. The algorithm uses the LALRparsing table for the grammar. The proposal extends Tomita’salgorithm and GSS data structure [17] to simultaneouslydiscover context-free paths on a data graph. The proposedalgorithm does not need to pre-process the whole graphin order to answer the query. The time complexity of thisalgorithm is given by O (| V | + k · | I | + k · | Σ | · | N |) , where k isthe maximum size of the right-hand side of the productionrules in the grammar and I is the number of lines of theLALR(1) parsing table.In [10], the authors propose a query processing algorithmbased on the LL parsing technique [2]. For queries of the form ( x , S ) , where x is a vertex of the graph and S is a non-terminalsymbol, the algorithm proceeds in a top-down manner, tryingto discover S -generated paths from x . The worst case runtimecomplexity of their algorithm is O (| V | · | P |) , where P is theset of production rules of the grammar.The authors in [9] evaluate the Context-Free Path Queryevaluation methods in [4, 8, 15]. The authors perform exper-iments with several data sets, including real and syntheticones. The paper focus on scalability of the three approachesand concludes that these methods are not yet adequate forbig data processing. We expect to contribute towards thatgoal. In this section we present some performance experiments toinvestigate the viability of our algorithm. We implemented aprototype using the Go programming language . The experi-ments were performed on a Debian 8.11, 64GB RAM, Intel The source code and data for out prototype is available at Github; the linkto it is not shown due to the double-blind revision process of the conference. n Algorithm for Context-Free Path Queries over Graph Databases SIGMOD ’20, June 14–19, 2020, Portland, OR

Xeon E312xx (Sandy Bridge) @ 2.195GHz, 64 bits. The resultspresented here are the average time and memory of 10 runs.We compared our algorithm to the one in [11]. Theiralgorithm is implemented in Python and was run usingthe same computer as the algorithm we propose here. Forboth algorithms, we performed the same experiments asin [4, 5, 8, 9, 11, 20]. The databases used in the experimentsinclude both synthetic graphs and publicly available ontolo-gies. The synthetic graphs and the grammars used to querythem were designed in order to explore specific characteris-tics of the evaluation mechanisms, such as their memory andruntime performance in their worst-case or random scenar-ios; the influence of grammar ambiguity or density/sparsityas well as to observe the scalability properties of our ap-proach. The dataset of ontologies consists of a number ofpopular ontologies publicly available and it is the same usedin previous works [4, 5, 20].The non-random synthetic graphs used in the experimentsare described as follows. A complete graph corresponds tothe product V × Σ × V , and it represents the worst-casescenario for the database, where each vertex is linked to allthe vertices of the graph, including itself. We also consideredtwo kinds of linear graphs, i.e. , graphs that have the formof a single straight path: the first kind, referred to as ab -listgraphs, is formed by graphs whose labels form a path a n b n ;the second kind, called σ -string graphs, is formed by straightline graphs where all the edges are labeled with σ . Cyclegraphs have all edges labeled with σ .Let us present some experiments to test the behaviour ofour algorithm in specific cases. Dealing with Ambiguous Grammars.

The data presentedin Figure 3 corresponds to the execution over ab -list. Weused Grammars 1 and 2, which recognize the language ofbalanced a ’s and b ’s. These grammars are defined as follows:Grammar 1. (Ambiguous) Generates strings containing bal-anced pairs of a ’s and b ’s [4, 5, 11, 20]: S → S S | a S b | ϵ Grammar 2.

Unambiguous grammar generating the samelanguage of Grammar 1 [4, 5, 11, 20]: S → a S b S | ϵ The query was defined as Q = {( x , S ) | x ∈ V } i.e. , we lookfor all vertices that are linked by an S -derived path from eachvertex of the ab -list graph.We observe that our algorithm presents a very efficientruntime behaviour as the graph grows in size, when com-pared to [11]. We also observe that the behaviour of ouralgorithm is not heavily affected by the grammar’s ambigu-ity. In terms of memory consumption, both algorithms behavein a similar way, with a small advantage to our algorithm. Dense and Sparse Grammars.

Figures 4 and 5 compare theexecution of our prototype and the LL [11] algorithm overcycle and path graphs, respectively, using Grammars 3 and 4and for the same query set as before.Grammar 3.

Dense grammar recognizing the language σ + [8]: A → A A A → σ The notion of a dense grammar refers to the fact of thegrammar generating strings without having empty transi-tions, in contrast to a sparse grammar.Grammar 4.

Sparse grammar recognizing the language σ ∗ [8]: B → B A | A B | ϵ A → σ As in the previous case, we observe that the behaviour ofour algorithm is better in terms of time and memory con-sumption, when compared to the algorithm in [11].For all graphs used in this experiment, our prototype pre-sented a time performance that seems to be better than theone given by Proposition 3.9.Notice that the form of the grammar’s production ruleshave an important influence over the time performance of thealgorithms. For σ -string and cycle graphs, sparse grammarsseem to have an advantage over dense grammars.Regarding memory consumption, we observe the samesituation as for the previous case, with our algorithm per-forming slightly better than the one in [11]. Experiment with ontologies.

For the next experiment weused a set of popular ontologies publicly available on theinternet. This dataset and the grammars described beloware the same used in previous works [4, 5, 11, 20]. The“geospecies” database and Grammar 7 were used in [9].Grammar 5 retrieves concepts in the same level of theRDFS’ subClassOf / type hierarchy. The experiment consistson performing a “same generation query” [1]. For each vertexof the graph, the query looks for all vertices that are at thesame level in the graph of the subclass/type hierarchy.Grammar 5. Retrieves concepts in the same level of hierar-chy [4, 5, 11, 20]: S → subClassOf S subClassOf − S → type S type − S → subClassOf subClassOf − S → type type − Grammar 6 retrieves concepts in adjacent levels of theRDFS’ subClassOf hierarchy.Grammar 6.

Retrieves concepts on adjacent levels of thehierarchy of classes in RDF [4, 5, 11, 20]: S → B subClassOf − B → subClassOf B subClassOf − | ϵ IGMOD ’20, June 14–19, 2020, Portland, OR Ciro M. Medeiros, Martin A. Musicante, and Umberto S. Costa t i m e ( m s ) TI G TI G LL G LL G m e m o r y ( M b ) Figure 3: ab -list graphs, Grammars G and G . . · t i m e ( m s ) TI G TI G LL G LL G m e m o r y ( M b ) Figure 4: Cycle graphs, Grammars G and G . . . · t i m e ( m s ) TI G TI G LL G LL G m e m o r y ( M b ) Figure 5: σ -string graphs, Grammars G and G . Grammar 7 retrieves concepts in the same level of the broaderTransitive hierarchy. These edges are directed fromchild to parent, relating categories of species, families, orders, etc. This is a real example of application, where a Context-Free Path Query is used to identify the pairs of vertices thatare in the same category inside the biological taxonomy. n Algorithm for Context-Free Path Queries over Graph Databases SIGMOD ’20, June 14–19, 2020, Portland, OR

Grammar 7.

Retrieves concepts on adjacent levels of hier-archy [9]: S → broaderTransitive S broaderTransitive − S → broaderTransitive broaderTransitive − The results of running our algorithm (as well as LL [11])are shown in Table 2. The query used in this case was thesame as in the previous cases: we look for paths departingfrom each vertex of the graph. The first three columns of thetable show the used grammar and ontology, the size of thegraph and the number of results obtained by the query.In the data presented in Table 2, we can observe thatboth algorithms behave in the same way as observed forthe synthetic examples given previously. In general, ouralgorithm performs better that the one in [11], with a greatdifference in time, in favor to our algorithm. The last line inTable 2 does not contain data for the LL algorithm, since ourcomputational resources were not sufficient for the normalexecution of that algorithm.

Querying Random graphs.

The next experiments were pro-posed by [9] and use random, synthetic graphs. We used agraph generator function based on the definition given byby [3]. Given the size of the graph in number of vertices n and a constant k ≤ n , the generator function, denoted by G ( n , k ) , starts with a clique of k vertices. For each v in the n − k remaining vertices, the generator adds k edges from v to any vertices already in the graph. The edge labels arerandomly chosen, being either a , b , c or d . The probability fora vertex to be chosen is directly proportional to its degree atthat moment, such that the higher the degree of the vertex,higher is its probability receive the new edges.Grammar 8. Defines the language a n b m c m d n [9]: S → a S d | a X d X → b X c | ϵ The runtimes and memory usage observed in this experi-ment follow the pattern of the previous ones: our algorithmoutperforms the running time observed for the LL-basedalgorithm, at the same time that it uses less memory.

We presented an algorithm for the evaluation of Context-FreePath Queries for RDF databases. Our algorithm combinescharacteristics of previously proposed techniques, in orderto obtain better scalability.We presented analysis about the correctness of our algo-rithm, as well as an estimation of its worst-case time andspace complexity.We validated our work by using both synthetic and real-life examples, showing that our prototype outperforms an-other, recently published algorithm. The query processed by our algorithm may be defined tocontain any context-free grammar. Our results show thatthere is no significant difference in the performance of thealgorithm in relation to conditions of the grammars, likeambiguity or spareness.The practical use of our algorithm may be allowed byincluding it as part of a query language engine, as it is men-tioned in [11].As future work, we will investigate the construction ofa parallel version of our algorithm. This may improve it’sperformance, since the treatment of unvisited vertices inposition sets may be done in parallel.We are also working on benchmarking protocols for algo-rithms for evaluation the of Context-Free Path Queries. Thiswould make possible to have more accurate data, in order tocompare the different algorithms that are being proposed toimplement this kind of queries.

REFERENCES [1] S. Abiteboul, R. Hull, and V. Vianu. 1995.

Foundations ofDatabases . Addison-Wesley. https://books.google.com.br/books?id=HN9QAAAAMAAJ[2] A.V. Aho, M.S. Lam, R. Sethi, and J.D. Ullman. 2007.

Compil-ers: Principles, Techniques, and Tools . ADDISON WESLEY Publish-ing Company Incorporated. https://books.google.com.br/books?id=WomBPgAACAAJ[3] R´eka Albert and Albert-L´aszl´o Barab´asi. 2002. Statistical mechanicsof complex networks.

Rev. Mod. Phys.

74 (Jan 2002), 47–97. Issue 1.https://doi.org/10.1103/RevModPhys.74.47[4] Rustam Azimov and Semyon Grigorev. 2017. Graph Parsing by MatrixMultiplication. (2017). arXiv:1707.01007 arXiv:1707.01007v1.[5] Semyon Grigorev and Anastasiya Ragozina. 2016. Context-Free PathQuerying with Structural Representation of Result. arXiv preprintarXiv:1612.08872 (2016).[6] D. Grune and C.J.H. Jacobs. 2007.

Parsing Techniques: A PracticalGuide . Springer New York. https://books.google.com.br/books?id=05xA d5dSwAC[7] Jelle Hellings. 2014. Conjunctive Context-Free Path Queries. In

Proc. 17th International Conference on Database Theory (ICDT), Athens,Greece, March 24-28, 2014 , Nicole Schweikardt, Vassilis Christophides,and Vincent Leroy (Eds.). OpenProceedings.org, 119–130. https://doi.org/10.5441/002/icdt.2014.15[8] Jelle Hellings. 2015. Path Results for Context-free Grammar Querieson Graphs.

CoRR abs/1502.02242 (2015).[9] Jochem Kuijpers, George Fletcher, Nikolay Yakovets, and Tobias Lin-daaker. 2019. An Experimental Study of Context-Free Path QueryEvaluation Methods. In

Proceedings of the 31st International Conferenceon Scientific and Statistical Database Management . ACM, 121–132.[10] Ciro M. Medeiros, Martin A. Musicante, and Umberto S. Costa. 2019.LL-based query answering over RDF databases.

Journal of ComputerLanguages

51 (2019), 75 – 87. https://doi.org/10.1016/j.cola.2019.02.002[11] Ciro M. Medeiros, Martin A. Musicante, and Umberto S. Costa. 2019.LL-based query answering over RDF databases.

Journal of ComputerLanguages

51 (2019), 75 – 87. https://doi.org/10.1016/j.cola.2019.02.002[12] A. O. Mendelzon and P. T. Wood. 1989. Finding Regular Simple Pathsin Graph Databases. In

Proceedings of the 15th International Conferenceon Very Large Data Bases (VLDB ’89) . Morgan Kaufmann PublishersInc., San Francisco, CA, USA, 185–193. http://dl.acm.org/citation.cfm?

IGMOD ’20, June 14–19, 2020, Portland, OR Ciro M. Medeiros, Martin A. Musicante, and Umberto S. Costa

This work LL [11]

Grammar & Graph | V | Results Time Memory Time Memory G , skos 43 810 4 ms 2.5 Mb 115 ms 6.7 Mb G , generations 82 2164 8 ms 2.9 Mb 411 ms 7.3 Mb G , travel 92 2499 10 ms 3.8 Mb 1139 ms 7.4 Mb G , univ bench 90 2540 9 ms 3.9 Mb 1226 ms 7.4 Mb G , foaf 93 4118 15 ms 4.6 Mb 1915 ms 7.4 Mb G , people pets 163 9472 48 ms 7.0 Mb 7614 ms 9.7 Mb G , funding 272 17634 151 ms 12.1 Mb 32059 ms 11.6 Mb G , atom primitive 142 15454 208 ms 14.6 Mb 48048 ms 11.0 Mb G , biomedical 134 15156 165 ms 12.7 Mb 43248 ms 11.4 Mb G , pizza 359 56195 407 ms 18.5 Mb 371402 ms 19.2 Mb G , wine 468 66572 425 ms 20.9 Mb 389951 ms 21.5 Mb G , skos 2 1 0 ms 1.5 Mb 0 ms 6.6 Mb G , generations 0 0 0 ms 1.6 Mb 0 ms 6.6 Mb G , travel 32 63 3 ms 2.0 Mb 9 ms 6.6 Mb G , univ bench 42 81 4 ms 2.2 Mb 11 ms 6.7 Mb G , foaf 13 10 0 ms 2.4 Mb 0 ms 6.6 Mb G , people pets 44 37 4 ms 3.0 Mb 5 ms 6.6 Mb G , funding 93 1158 12 ms 3.5 Mb 932 ms 7.4 Mb G , atom primitive 124 122 86 ms 11.5 Mb 20 ms 6.7 Mb G , biomedical 123 2871 34 ms 6.3 Mb 3211 ms 8.0 Mb G , pizza 261 1262 34 ms 6.9 Mb 2019 ms 7.9 Mb G , wine 163 133 7 ms 3.5 Mb 58 ms 7.0 Mb G , geospecies 20882 226669749 624352 ms 36844.7 Mb N/A N/A Table 2: Performance Evaluation on RDF Databases.

This work LL [11]

Grammar & Graph | V | Results Time Memory Time Memory G , G (100,1) 100 5 3 ms 2.5 Mb 4 ms 6.6 Mb G , G (500,1) 500 25 11 ms 4.8 Mb 71 ms 7.5 Mb G , G (2500,1) 2500 161 143 ms 18.0 Mb 1372 ms 11.7 Mb G , G (10000,1) 10000 706 708 ms 47.2 Mb 20834 ms 27.7 Mb G , G (100,3) 100 56 6 ms 2.3 Mb 46 ms 7.0 Mb G , G (500,3) 500 769 44 ms 6.9 Mb 1954 ms 8.6 Mb G , G (2500,3) 2500 3377 232 ms 22.0 Mb 46668 ms 17.7 Mb G , G (10000,3) 10000 14583 1181 ms 72.1 Mb 826796 ms 51.7 Mb G , G (100,5) 100 312 6 ms 2.9 Mb 465 ms 7.1 Mb G , G (500,5) 500 2207 63 ms 8.7 Mb 11413 ms 10.0 Mb G , G (2500,5) 2500 13823 456 ms 29.0 Mb 415582 ms 25.8 Mb G , G (10000,5) 10000 77423 1946 ms 102.9 Mb N/A N/A G , G (100,10) 100 1068 18 ms 3.6 Mb 6362 ms 7.8 Mb G , G (500,10) 500 10211 249 ms 15.4 Mb 217209 ms 14.8 Mb G , G (2500,10) 2500 102867 1736 ms 65.1 Mb N/A N/A G , G (10000,10) 10000 784055 10476 ms 350.7 Mb N/A N/A Table 3: Experiment with grammar G id=88830.88850[13] Alberto O. Mendelzon and Peter T. Wood. 1995. Finding RegularSimple Paths in Graph Databases. SIAM J. Comput.

24, 6 (1995), 1235–1258. http://dblp.uni-trier.de/db/journals/siamcomp/siamcomp24. html { RDF } . Web Semantics: Science, Servicesand Agents on the World Wide Web

8, 4 (2010), 255 – 270. https://doi. n Algorithm for Context-Free Path Queries over Graph Databases SIGMOD ’20, June 14–19, 2020, Portland, OR org/10.1016/j.websem.2010.01.002 Semantic Web Challenge 2009UserInteraction in Semantic Web research.[15] Fred C. Santos, Umberto S. Costa, and Martin A. Musicante. 2018. ABottom-Up Algorithm for Answering Context-Free Path Queries inGraph Databases. In

Web Engineering , Tommi Mikkonen, Ralf Klamma,and Juan Hern´andez (Eds.). Springer International Publishing, Cham,225–233.[16] Elizabeth Scott and Adrian Johnstone. 2010. GLL Parsing.

Elec-tronic Notes in Theoretical Computer Science

Efficient Parsing for Natural Language: A FastAlgorithm for Practical Systems . Kluwer Academic Publishers, Norwell,MA, USA.[18] Leslie G. Valiant. 1975. General Context-Free Recognition in Lessthan Cubic Time.

J. Comput. Syst. Sci.