[PDF] Enumeration Algorithms for Conjunctive Queries with Projection

Abstract

We investigate the enumeration of query results for an important subset of CQs with projections, namely star and path queries. The task is to design data structures and algorithms that allow for efficient enumeration with delay guarantees after a preprocessing phase. Our main contribution is a series of results based on the idea of interleaving precomputed output with further join processing to maintain delay guarantees, which maybe of independent interest. In particular, for star queries, we design combinatorial algorithms that provide instance-specific delay guarantees in linear preprocessing time. These algorithms improve upon the currently best known results. Further, we show how existing results can be improved upon by using fast matrix multiplication. We also present new results involving tradeoff between preprocessing time and delay guarantees for enumeration of path queries that contain projections. CQs with projection where the join attribute is projected away is equivalent to boolean matrix multiplication. Our results can therefore also be interpreted as sparse, output-sensitive matrix multiplication with delay guarantees.

Full PDF

EEnumeration Algorithms for Conjunctive Querieswith Projection

Shaleen Deep ! Department of Computer Sciences, University of Wisconsin-Madison, Madison, Wisconsin, USA

Xiao Hu ! Department of Computer Sciences, Duke University, Durham, North Carolina, USA

Paraschos Koutris ! Department of Computer Sciences, University of Wisconsin-Madison, Madison, Wisconsin, USA

Abstract

We investigate the enumeration of query results for an important subset of CQs with projections,namely star and path queries. The task is to design data structures and algorithms that allow forefficient enumeration with delay guarantees after a preprocessing phase. Our main contribution is aseries of results based on the idea of interleaving precomputed output with further join processing tomaintain delay guarantees, which maybe of independent interest. In particular, for star queries, wedesign combinatorial algorithms that provide instance-specific delay guarantees in linear preprocessingtime. These algorithms improve upon the currently best known results. Further, we show howexisting results can be improved upon by using fast matrix multiplication. We also present newresults involving tradeoff between preprocessing time and delay guarantees for enumeration of pathqueries that contain projections. CQs with projection where the join attribute is projected away isequivalent to boolean matrix multiplication. Our results can therefore also be interpreted as sparse,output-sensitive matrix multiplication with delay guarantees.

Theory of computation → Database theory

Keywords and phrases

Query result enumeration, joins

Digital Object Identifier

Funding

This research was supported in part by National Science Foundation grants CRII-1850348and III-1910014

Acknowledgements

We would like to thank the anonymous reviewers for their careful reading andvaluable comments that contributed greatly in improving the manuscript.

The efficient evaluation of join queries over static databases is a fundamental problem indata management. There has been a long line of research on the design and analysis ofalgorithms that minimize the total runtime of query execution in terms of the input andoutput size [32, 20, 19]. However, in many data processing scenarios it is beneficial to splitquery execution into two phases: the preprocessing phase , which computes a space-efficientintermediate data structure, and the enumeration phase , which uses the data structure toenumerate the query results as fast as possible, with the goal of minimizing the delay betweenoutputting two consecutive tuples in the result. This distinction is beneficial for severalreasons. For instance, in many scenarios, the user wants to see one (or a few) results of thequery as fast as possible: in this case, we want to minimize the time of the preprocessingphase, such that we can output the first results quickly. On the other hand, a data processingpipeline may require that the result of a query is accessed multiple times by a downstreamtask: in this case, it is better to spend more time during the preprocessing phase, to guaranteea faster enumeration with smaller delay. © S. Deep, X. Hu and P. Koutris;licensed under Creative Commons License CC-BY 4.024th International Conference on Database Theory (ICDT 2021).Editors: Ke Yi and Zhewei Wei; Article No. 3; pp. 3:1–3:24Leibniz International Proceedings in InformaticsSchloss Dagstuhl – Leibniz-Zentrum für Informatik, Dagstuhl Publishing, Germany a r X i v : . [ c s . D B ] F e b :2 Enumeration Algorithms for Conjunctive Queries with Projection Previous work in the database literature has focused on finding the class of queries thatcan be computed with O ( | D | ) preprocessing time (where D is the input database instance)and constant delay during the enumeration phase. The main result in this line of workshows that full (i.e., without projections) acyclic Conjunctive Queries (CQs) admit linearpreprocessing time and constant delay [3]. If the CQ is not full but its free variables satisfythe free-connex property, the same preprocessing time and delay guarantees can still beachieved. It is also known that for any (possibly non-full) acyclic CQ, it is possible toachieve linear delay after linear preprocessing time [3]. Prior work that uses structuraldecomposition methods [14] generalized these results to arbitrary CQs with free variablesand showed that the projected solutions can be enumerated with O ( | D | fhw ) delay. Moreover,a dichotomy about the classes of conjunctive queries with fixed arities where such answerscan be computed with polynomial delay (WPD) is also shown. When the CQ is full but notacyclic, factorized databases uses O ( | D | fhw ) preprocessing time to achieve constant delay,where fhw is the fractional hypertree width [13] of the query. We should note here that wecan always compute and materialize the result of the query during preprocessing to achieveconstant delay enumeration but at the cost of using exponential amount of space in general.The aforementioned prior work investigates specific points in the preprocessing time-delaytradeoff space. While the story for full acyclic CQs is relatively complete, the same is nottrue for general CQs, even for acyclic CQs with projections. For instance, consider thesimplest such query: Q two-path = π x,z ( R ( x, y ) ⋊⋉ S ( y, z )), which joins two binary relations andthen projects out the join attribute. For this query, [3] ruled out a constant delay algorithmwith linear time preprocessing unless the boolean matrix multiplication exponent is ω = 2.However, we can obtain O ( | D | ) delay with O ( | D | ) preprocessing time. We can also obtain O (1) delay with O ( | D | ) preprocessing by computing and storing the full result. It is worthasking whether there are other interesting points in this tradeoff between preprocessingtime and delay. Towards this end, seminal work by Kara et al. [17] showed that for anyhierarchical CQ (possibly with projections), there always exists a smooth tradeoff betweenpreprocessing time and delay. This is the first improvement over the results of Bagan etal. [3] in over a decade for queries involving projections. Applied to the query Q two-path ,the main result of of [17] shows that for any ϵ ∈ [0 , O ( | D | − ϵ ) delay with O ( | D | ϵ ) preprocessing time.In this paper, we continue the investigation of the tradeoff between preprocessing timeand delay for CQs with projections. We focus on two classes of CQs: star queries , which area popular subset of hierarchical queries, and a useful subset of non-hierarchical queries knownas path queries . We focus narrowly on these two classes for two reasons. First, star queriesare of immense practical interest given their connections to set intersection, set similarityjoins and applications to entity matching (we refer the reader to [9] for an overview). Themost common star query seen in practice is Q two-path . The same holds true for path queries,which are fundamental in graph processing. Second, as we will see in this paper, even for thesimple class of star queries, the tradeoff landscape is complex and requires the developmentof novel techniques. We also present a result on another subset of hierarchical CQs that wecall left-deep. Our key insight is to design enumeration algorithms that depend not only onthe input size | D | , but are also aware of other data-specific parameters such as the outputsize. To give a flavor of our results, consider the query Q two-path , and denote by OUT ⋊⋉ theoutput of the corresponding query without projections, R ( x, y ) ⋊⋉ S ( y, z ). We can show thefollowing result. Hierarchical CQs are a strict subset of acyclic CQs. . Deep, X. Hu and P. Koutris 3:3

Queries Preprocessing Delay SourceArbitrary acyclic CQ O ( | D | ) O ( | D | ) [3]Free-connex CQ (projections) O ( | D | ) O (1) [3]Full CQ O ( | D | fhw ) O (1) [21]Full CQ O ( | D | subw log | D | ) O (1) [1]Hierarchical CQ (with projections) O ( | D | w − ϵ ) O ( | D | − ϵ ) ϵ ∈ [0 ,

1] [17]Star query with k relations (with projections) O ( | D | ) O ( | D | k/ ( k − | OUT ⋊⋉ | / ( k − ) this paperPath query with k relations (with projections) O ( | D | − ϵ/ ( k − ) O ( | D | ϵ ) ,ϵ ∈ [0 ,

1) this paperLeft-deep hierarchical CQ (with projections) O ( | D | ) O ( | D | k / | OUT ⋊⋉ | ) this paperTwo path query (with projections) O ( | D | ω · ϵ ) O ( | D | − ϵ ) ,ϵ ∈ [ ω +1 ,

1] this paper

Figure 1

Preprocessing time and delay guarantees for different queries. | OUT ⋊⋉ | denotes the size ofjoin query under consideration but without any projections. subw denotes the submodular width ofthe query. For each class of query, the total running time is O (min {| D |·| OUT π | , | D | subw log | D | + | OUT π |} )where | OUT π | denotes the size of the query result. See the related work section for more discussionon best running times for two path and star queries. ▶ Theorem 1.

Given a database instance D , we can enumerate the output of Q two-path = π x,z ( R ( x, y ) ⋊⋉ S ( y, z )) with preprocessing time O ( | D | ) and delay O ( | D | / | OUT ⋊⋉ | ) . At this point, the reader may wonder about the improvement obtained from the aboveresult. [17] implies that with preprocessing time O ( | D | ), the delay guarantee in the worst-case is O ( | D | ). This raises the question whether the delay from Theorem 1 is truly analgorithmic improvement rather than an improved analysis of [17]. We answer the questionpositively. Specifically, we show that there exists a database instance where the delay obtainedfrom Theorem 1 is a polynomial improvement over the actual guarantee [17] and not justthe worst-case. When the preprocessing time is linear, the delay implied by our result isdependent on the size of the full join. In the worst case where | OUT ⋊⋉ | = Θ( | D | ), we actuallyobtain the best delay, which will be constant. Compare this to the result of [17], which wouldrequire nearly O ( | D | ) preprocessing time to achieve the same guarantee. On the other hand,if | OUT ⋊⋉ | = Θ( | D | ), we obtain only a linear delay guarantee of O ( | D | ) . The reader maywonder how our result compares in general with the tradeoff in [17] in the worst-case; we willshow that we can always get at least as good of a tradeoff point as the one in [17]. Figure 1summarizes the prior work and the results present in this paper. Our Contribution.

In this paper, we improve the state-of-the-art on the preprocessingtime-delay tradeoff for a subset of CQs with projections. We summarize our main technicalcontributions below (highlighted in Figure 1): Our main contribution consists of a novel algorithm (Theorem 7 in Section 4) thatachieves output-dependent delay guarantees for star queries after linear preprocessing We do not need to consider the case where | OUT ⋊⋉ | ≤ | D | , since then we can simply materializethe full result during the preprocessing time using constant delay enumeration for queries withoutprojections [22]. I C D T 2 0 2 1 :4 Enumeration Algorithms for Conjunctive Queries with Projection time. Specifically, we show that for the query π x ,...,x k ( R ( x , y ) ⋊⋉ · · · ⋊⋉ R k ( x k , y )) wecan achieve delay O ( | D | k/ ( k − / | OUT ⋊⋉ | /k − ) with linear preprocessing. Our key idea isto identify an appropriate degree threshold to split a relation into partitions of heavy and light , which allows us to perform efficient enumeration. For star queries, our result impliesthat there exists no smooth tradeoff between preprocessing time and delay guarantees asstated in [17] for the class of hierarchical queries. We introduce the novel idea of interleaving join query computation in the context ofenumeration algorithms which forms the foundation for our algorithms, and may be ofindependent interest. Specifically, we show that it is possible to union the output of twoalgorithms A and A ′ with δ delay guarantee where A enumerates query results with δ delay guarantees but A ′ does not. This technique allows us to compute a subset of aquery on-the-fly when enumeration with good delay guarantees is impossible (Lemma 4and Lemma 5) in Section 3. We show how fast matrix multiplication can be used to obtain a tradeoff betweenpreprocessing time and delay that further improves upon the tradeoff in [17]. We alsopresent an algorithm for left-deep hierarchical queries with linear preprocessing time andoutput-dependent delay guarantees (Section 5). Finally, we present new results on preprocessing time-delay tradeoffs for a non-hierarchicalquery with projections, for the class of path queries (Section 6). A path query has theform π x ,x k +1 ( R ( x , x ) ⋊⋉ · · · ⋊⋉ R k ( x k , x k +1 )). Our results show that we can achievedelay O ( | D | ϵ ) with preprocessing time O ( | D | − ϵ/ ( k − ) for any ϵ ∈ [0 , In this section, we present the basic notation and terminology.

In this paper, we will focus on the class of conjunctive queries (CQs), which we denote as Q = π y ( R ( x ) ⋊⋉ R ( x ) ⋊⋉ . . . ⋊⋉ R n ( x n ))Here, the symbols y , x , . . . , x n are vectors that contain variables or constants . We say that Q is full if there is no projection. We will typically use the symbols x, y, z, . . . to denotevariables, and a, b, c, . . . to denote constants. We use Q ( D ) to denote the result of the query Q over input database D .In this paper, we will focus on CQs that have no constants and no repeated variables inthe same atom (both cases can be handled within a linear time preprocessing step, so thisassumption is without any loss of generality). Such a query can be represented equivalentlyas a hypergraph H Q = ( V Q , E Q ), where V Q is the set of variables, and for each hyperedge F ∈ E Q there exists a relation R F with variables F .We will be particularly interested in two families of CQs that are fundamental in queryprocessing, star and path queries. The star query with k relations is expressed as: Q ∗ k = R ( x , y ) ⋊⋉ R ( x , y ) ⋊⋉ · · · ⋊⋉ R k ( x k , y )where x , . . . , x k have disjoint sets of variables. The path query with k (binary) relations isexpressed as: P k = R ( x , x ) ⋊⋉ R ( x , x ) ⋊⋉ · · · ⋊⋉ R k ( x k , x k +1 ) . Deep, X. Hu and P. Koutris 3:5 In Q ∗ k , variables in each relation R i are partitioned into two sets: variables x i that arepresent only in R i and a common set of join variables y present in every relation. Hierarchical Queries.

A CQ Q is hierarchical if for any two of its variables, either the setsof atoms in which they occur are disjoint or one is contained in the other [28]. For example, Q ∗ k is hierarchical for any k , while P k is hierarchical only when k ≤ Join Size Bounds.

Let H = ( V , E ) be a hypergraph, and S ⊆ V . A weight assignment u = ( u F ) F ∈E is called a fractional edge cover of S if ( i ) for every F ∈ E , u F ≥ ii ) forevery x ∈ S, P F : x ∈ F u F ≥

1. The fractional edge cover number of S , denoted by ρ ∗H ( S ) isthe minimum of P F ∈E u F over all fractional edge covers of S . We write ρ ∗ ( H ) = ρ ∗H ( V ). Tree Decompositions.

Let H = ( V , E ) be a hypergraph of a CQ Q . A tree decomposition of H is a tuple ( T , ( B t ) t ∈ V ( T ) ) where T is a tree, and every B t is a subset of V , called the bag of t , such that each edge in E is contained in some bag; and for each variable x ∈ V , the set of nodes { t | x ∈ B t } form a connected subtree of T .The fractional hypertree width of a decomposition is defined as max t ∈ V ( T ) ρ ∗ ( B t ), where ρ ∗ ( B t ) is the minimum fractional edge cover of the vertices in B t . The fractional hypertreewidth of a query Q , denoted fhw ( Q ), is the minimum fractional hypertree width among alltree decompositions of its hypergraph. We say that a query is acyclic if fhw ( Q ) = 1. Computational Model.

To measure the running time of our algorithms, we will usethe uniform-cost RAM model [15], where data values as well as pointers to databases areof constant size. Throughout the paper, all complexity results are with respect to datacomplexity, where the query is assumed fixed.

Let A be a U × U matrix and C be a U × U matrix over any field F . A i,j is the shorthandnotation for entry of A located in row i and column j . The matrix product is given by( AC ) i,j = P U k =1 A i,k C k,j . Algorithms for fast matrix multiplication are of extreme theoreticalinterest given its fundamental importance. We will frequently use the following folklorelemma about rectangular matrix multiplication. ▶ Lemma 2.

Let ω be the smallest constant such that an algorithm to multiply two n × n matrices that runs in time O ( n ω ) is known. Let β = min { U, V, W } . Then fast matrixmultiplication of matrices of size U × V and V × W can be done in time O ( U V W β ω − ) . Observe that in Lemma 2, matrix multiplication cost dominates the time required toconstruct the input matrices (if they have not been constructed already) for all ω ≥

2. Fixing ω = 2, rectangular matrix multiplication can be done in time O ( U V W/β ). A long line ofresearch on fast square matrix multiplication has dropped the complexity to O ( n ω ), where2 ≤ ω <

3. The current best known value is ω = 2 . Given a Conjunctive Query Q and an input database D , we want to enumerate the tuples in Q ( D ) in any order. We will study this problem in the enumeration framework similar tothat of [26], where an algorithm can be decomposed into two phases: Preprocessing phase: it computes a data structure that takes space S p in preprocessingtime T p . I C D T 2 0 2 1 :6 Enumeration Algorithms for Conjunctive Queries with Projection

Enumeration phase: it outputs Q ( D ) with no repetitions. This phase has access to anydata structures constructed in the preprocessing phase and can also use additional spaceof size S e . The delay δ is defined as the maximum time duration between outputting anypair of consecutive tuples (and also the time to output the first tuple, and the time tonotify that the enumeration phase has completed).In this work, our goal is to study the relationship between the preprocessing time T p and delay δ for a given CQ Q . Ideally, we would like to achieve the best possible delay inlinear preprocessing time. As Figure 1 shows, when Q is full, with T p = O ( | D | fhw ), we canenumerate the results with constant delay O (1) [21]. In the particular case where Q is acyclici.e. fhw = 1, we can achieve constant delay with only linear preprocessing time. On theother hand, [3] shows that for every acyclic CQ, we can achieve linear delay O ( | D | ) withlinear preprocessing time O ( | D | ).Recently, [17] showed that it is possible to get a tradeoff between the two extremes, forthe class of hierarchical queries. Note that hierarchical queries are acyclic but not necessarilyfree-connex. This is the first non-trivial result that improves upon the linear delay guaranteesgiven by [3] for queries with projections. ▶ Theorem 3 (due to [17]) . Consider a hierarchical CQ Q with factorization width w , andan input instance D . Then, for any ϵ ∈ [0 , there exists an algorithm that can preprocess D in time T p = O ( | D | w − ϵ ) and space S p = O ( | D | w − ϵ ) such that we can enumerate thequery output with delay δ = O ( | D | − ϵ ) space S e = O (1) . The factorization width w of a query, originally introduced as s ↑ [22], is a generalizationof the fractional hypertree width from boolean to arbitrary CQs. For π x ,..., x k ( Q ∗ k ), thefactorization width is w = k . Observe that preprocessing time T p is always smaller thanthe time required to evaluate the full join result. This is because if T p = Θ( | OUT ⋊⋉ | ), we canevaluate the full join and deduplicate the projection output, allowing us to obtain constantdelay in the enumeration phase. This implies that ϵ can only take values between 0 and(log | D | | OUT ⋊⋉ | − / ( w − Before we present the proof of our main results, we discuss three useful lemmas which willbe used frequently, and may be of independent interest for enumeration algorithms. Thefirst two lemmas are based on the key idea of interleaving query results which we describenext. We note that idea of interleaving computation has been explored in the past to developdynamic algorithms with good worst-case bounds using static data structures [23].We say that an algorithm A provides no delay guarantees to mean that its delay guaranteeis its total execution time. In other words, if an algorithm requires time T to complete, itsdelay guarantee is upper bounded by T . Since we are using the uniform-cost RAM model,each operation takes one unit of time. ▶ Lemma 4.

Consider two algorithms A and A ′ such that A enumerates query results in total time at most T with no delay guarantees. A ′ enumerates query results with delay δ and runs in total time at least T ′ . The outputs of A and A ′ are disjoint. T and T ′ are provided as input to the algorithm. . Deep, X. Hu and P. Koutris 3:7 Then, the union of the outputs of A and A ′ can be enumerated with delay c · δ · max { , T /T ′ } for some constant c . Lemma 4 tells us that as long as T = O ( T ′ ), the output of A and A ′ can be combinedwithout giving up on delay guarantees by pacing the output of A ′ . Note that we need toknow the exact values of T and T ′ (by calculating the number of operations in the algorithms A and A ′ to bound the running time). The next lemma introduces our second key idea ofinterleaving stored output result with on-the-fly query computation (the full algorithm andproof can be found in Appendix A). ▶ Lemma 5.

Consider an algorithm A that enumerates query results in total time at most T with no delay guarantees, where T is known in advance. Suppose that J output tuples havebeen stored apriori with no duplicate tuples, where J ≤ T . Then, there exists an algorithmthat enumerates the output with delay guarantee δ = O ( T /J ) . The final helping lemma allows us to enumerate the union of (possibly overlapping) resultsof m different algorithms where each algorithm outputs its result according to a total order ⪯ , such that the union is also enumerated in sorted order according to ⪯ . This lemma isbased on the idea presented as Fact 3 . . ▶ Lemma 6.

Consider m algorithms A , A , · · · , A m such that each A i enumerates itsoutput L i with delay O ( δ ) according to the total order ⪯ . Then, the union of their output canbe enumerated (without duplicates) with O ( m · δ ) delay and in sorted order according to ⪯ . Directly implied by Lemma 6 is the fact that the list merge problem can be enumeratedwith delay guarantees: Given m lists L , L , · · · , L m whose elements are drawn from acommon domain, if elements in L i are distinct (i.e no duplicates) and ordered according to ⪯ , then the union of all lists S mi =1 L i can be enumerated in sorted order given by ⪯ withdelay O ( m ). Note that the enumeration algorithm A i degenerates to going over elements oneby one in list L i , which has O (1) delay guarantee as long as indexes/pointers within L i arewell-built. Throughout the paper, we use this primitive as ListMerge ( L , L , · · · , L m ). In this section, we study enumeration algorithms for the star query π r ( Q ∗ k ) where r ⊆ S i ∈{ , , ··· ,k } x i . Our main result is Theorem 7 that we present below. We first present adetailed discussion on how our result is an improvement over prior work in Subsection 4.1.Then, we present a warm-up proof for π r ( Q ∗ k ) in Subsection 4.2, followed by the proof forthe general result in Subsection 4.3. ▶ Theorem 7.

Consider the star query with projection π r ( Q ∗ k ) where r ⊆ S i ∈{ , , ··· ,k } x i and an instance D . There exists an algorithm with preprocessing time T p = O ( | D | ) andpreprocessing space S p = O ( | D | ) , such that we can enumerate Q ∗ k ( D ) withdelay δ = O (cid:18) | D | k/k − | OUT ⋊⋉ | /k − (cid:19) and space S e = O ( | D | ) . In the above theorem, the delay depends on the full join result size | OUT ⋊⋉ | = | Q ∗ k ( D ) | .As the join size increases, the algorithm can obtain better delay guarantees. In the extreme We assume that r contains at least one variable from each x i . Otherwise, we can remove relations withno projection variables after the preprocessing phase. I C D T 2 0 2 1 :8 Enumeration Algorithms for Conjunctive Queries with Projection l og | D | | OUT (cid:46)(cid:47) | . . . . . . . l og | D | T p . . . . . . . l og | D | δ . . . . . . l og | D | | OUT (cid:46)(cid:47) | . . . . . . . l og | D | T p . . . . . . . l og | D | δ . . . . . . Figure 2

Worst-case tradeoffs given by Theorem 3 without (left) and with (right) taking | OUT ⋊⋉ | into consideration. l og | D | | OUT (cid:46)(cid:47) | . . . . . . . l og | D | T p . . . . . . . l og | D | δ . . . . . . Figure 3

Trade-off in the worst-case forstar query. case when | OUT ⋊⋉ | = Θ( | D | k ), it achieves constant delay with linear time preprocessing. Inthe other extreme, when | OUT ⋊⋉ | = Θ( | D | ), it achieves linear delay.When | OUT ⋊⋉ | has linear size, we can compute and materialize the result of the queryin linear preprocessing time and achieve constant delay enumeration. Generalizing thisobservation, when T p is sufficient to evaluate the full join result, we can always achieveconstant delay. It is instructive now to compare the worst-case delay guarantee obtained by Theorem 3 for Q ∗ k ( D ) with Theorem 7. Suppose that we want to achieve delay δ = O ( | D | − ϵ ) for some ϵ ∈ [0 , (log | D | | OUT ⋊⋉ | − / ( k − O ( | D | ϵ ( k − )preprocessing time. Then, it holds that: | D | − ϵ ≥ | D | − (log | D | | OUT ⋊⋉ |− k − = | D | k − log | D | | OUT ⋊⋉ | k − = | D | k/k − / | OUT ⋊⋉ | /k − . Deep, X. Hu and P. Koutris 3:9 R ( x, y ) S ( y, z ) d e f d f ...... d N f N (a) Database D with full join size N . R ( x, y ) S ( y, z ) a b c a b c ... ...... a √ N b √ N c √ N (b) Database D with full join size N / . Figure 5 D ∪ D forms a database where Theorem 7 improves the delay of Theorem 3. In other words, either we have enough preprocessing time to materialize the output andachieve constant delay, or we can achieve the desirable delay with linear preprocessing time.Figure 2, Figure 3 and Figure 4 show the existing and new tradeoff results. Figure 2shows the tradeoff curve obtained from Theorem 3 by adding | OUT ⋊⋉ | as a third dimension,and adding the optimization for constant delay when T p ≥ O ( | OUT ⋊⋉ | ). Figure 3 shows thetradeoff obtained from our result, while Figure 4 shows other existing results for a fixedvalue of | OUT ⋊⋉ | . For a fixed value of | OUT ⋊⋉ | , the delay guarantee does not change in Figure 3as we increase T p from | D | to | OUT ⋊⋉ | . It remains an open question to further decrease thedelay if we allow more preprocessing time. Such an algorithm would correspond to a curveconnecting the red point( • ) and the green triangle( ) in Figure 4.Our results thus imply that, depending on | OUT ⋊⋉ | , one must choose a different algorithmto achieve the optimal tradeoff between preprocessing time and delay. Since | OUT ⋊⋉ | can becomputed in linear time (using a simple adaptation of Yannakakis algorithm [32, 24]), thiscan be done without affecting the preprocessing bounds.Next, we show how our result provides an algorithmic improvement over Theorem 3.Consider the instances D , D depicted in Figure 5a and Figure 5b respectively, and assumewe want to use linear preprocessing time. For D , the algorithm of Theorem 3 materializesnothing, since no y valuation has a degree of O ( | D | ), and the delay will be Θ( √ N ). Nomaterialization also occurs for D , but here the delay will be O (1). It is easy to check thatour algorithm matches the delay on both instances. Now, consider the instance D = D ∪ D .The input size for D is Θ( N ), while the full join size is N / + N = Θ( N ). The algorithmof Theorem 3 will again achieve only a Θ( √ N ) delay, since after the linear time preprocessingno y valuations can be materialized. In contrast, our algorithm still guarantees a constantdelay. This algorithmic improvement is a result of the careful overlapping of the constant-delaycomputation for instance D with the computation for D .The above construction can be generalized as follows. Let α ∈ (0 ,

1) be some constant. D remains the same. For D , we construct R to be the cross product of N α x -values and N − α y -values, and S to be the cross product of N α z -values and N − α y -values. As before, let D = D ∪ D . The input size for D is Θ( N ), while the full join size is N − α + N = Θ( N ).Hence, our algorithm achieves constant delay with linear preprocessing time. In contrast,the algorithm of Theorem 3 achieves Θ( N − α ) delay with linear preprocessing time. Infact, the Θ( N − α ) delay occurs even if we allow O ( N ϵ ) preprocessing time for any ϵ < α .We can now use the same idea to show that there also exists an instance where achievingconstant delay using Theorem 3 requires near quadratic preprocessing time (Example 14 inthe appendix).In the rest of the paper, for simplicity of exposition, we assume that all variable vectors I C D T 2 0 2 1 :10 Enumeration Algorithms for Conjunctive Queries with Projection x i , y in Q ∗ k are singletons (i.e, all the relations are binary) and r = { x , x , . . . , x k } . Theproof for the general query is a straightforward extension of the binary case. As a warm-up step, we will present an algorithm for the query Q two-path = π x,z ( R ( x, y ) ⋊⋉ S ( y, z ))that achieves O ( | D | / | OUT ⋊⋉ | ) delay with linear preprocessing time.At a high level, we will decompose the join into two subqueries with disjoint outputs.The subqueries will be generated based on whether a valuation for x is light or not based onits degree in relation R . For all light valuations of x (degree at most δ ), we will show thattheir enumeration is achievable with delay δ . For the heavy x valuations, we will show thatthey also can be computed on-the-fly while maintaining the delay guarantees. Preprocessing Phase.

We first process the input relations such that we remove anydangling tuples. During the preprocessing phase, we will store the input relations as a hashmap and sort the valuations for x in increasing order of their degree. Using any comparisonbased sorting technique requires Ω( | D | log | D | ) time in general. Thus, if we wish to removethe log | D | factor, we must use non-comparison based sorting algorithms. In this paper,we will use count sort [8] which has complexity O ( | D | + r ) where r is the range of thenon-negative key values. However, we need to ensure that all relations in the database D satisfy the bounded range requirement. This can be easily accomplished by introducing abijective function f : dom ( D ) → { , , . . . , | D |} that maps all values in the active domainof the database to some integer between 1 and | D | (both inclusive). Both f and its inverse f − can be stored as hash tables as follows: suppose there is a counter c ←

1. We performa linear pass over the database and check if some value v ∈ dom ( D ) has been mapped ornot (by checking if there exists an entry f ( v )). If not, we set f ( v ) = c, f − ( c ) = v andincrement c . Once the hash tables f and f − have been created, we modify the input relation R (and S similarly) by replacing every tuple t ∈ R with tuple t ′ = f ( t ). Since the mappingis a relabeling scheme, such a transformation preserves the degree of all the values. Thecodomain of f is also equipped with a total order ⪯ (we will use ≤ ). Note that f is not anorder-preserving transformation in general but this property is not required in any of ouralgorithms.Next, for every tuple t ∈ R ( x, y ), we create a hash map with key π x ( t ) and the value is alist π y ( t ); and for every tuple t ∈ S ( y, z ), we create a hash map with key π y ( t ) and the valueis a list π z ( t ). For the second hash map, we sort the value list using sort order ⪯ for eachkey, once each tuple t ∈ S ( y, z ) has been processed. Finally, we sort all values in π x ( R ) inincreasing order of their degree in R (i.e | σ x = v i R ( x, y ) | is the sort key). Let L = { v , . . . , v n } denote the ordered set of these values sorted by their degree and let d , . . . , d n be theirrespective degrees. Creating the sorted list L takes O ( | D | ) time since the degrees d i satisfythe bounded range requirement (i.e 1 ≤ d i ≤ | D | ). Next, we identify the smallest index i ∗ such that X v : { v ,v ,...,v i ∗ } | R ( v, y ) ⋊⋉ S ( y, z ) | ≥ X v : { v i ∗ +1 ,...,v n } | R ( v, y ) ⋊⋉ S ( y, z ) | (1)This can be computed by doing a linear pass on L using a simple adaptation of Yannakakisalgorithm [32, 24]. This entire phase takes time O ( | D | ). Enumeration Phase.

The enumeration algorithm interleaves the following two loops usingthe construction in Lemma 4. Specifically, it will spend an equal amount of time (a constant)before switching to the computation of the other loop. . Deep, X. Hu and P. Koutris 3:11

Algorithm 1

EnumTwoPath for i = 1 , . . . , i ∗ do Let π y ( σ x = v i ( R )) = { u , u , · · · , u ℓ } ; output ( v i , f − ( ListMerge ( π z σ y = u S, π z σ y = u S, · · · , π z σ y = u ℓ S ))) run for O (1) timethen switchrun for O (1) timethen switch for i = i ∗ + 1 , . . . , n do Let π y ( σ x = v i ( R )) = { u , u , · · · , u ℓ } ; output ( v i , f − ( ListMerge ( π z σ y = u S, π z σ y = u S, · · · , π z σ y = u ℓ S )))The algorithm alternates between low-degree and high-degree values in L . The main ideais that, for a given v i ∈ L , we can enumerate the result of the subquery σ x = v i ( Q two-path )with delay O ( d i ). This can be accomplished by observing that the subquery is equivalent tolist merging and so we can use Algorithm 3. ▶ Lemma 8.

For the query Q two-path and an instance D , we can enumerate Q two-path ( D ) with delay δ = O ( | D | / | OUT ⋊⋉ | ) and S e = O ( | D | ) . The reader should note that the delay of δ = O ( | D | / | OUT ⋊⋉ | ) is only an upper bound.Depending on the skew present in the database instance, it is possible that Algorithm 1achieves much better delay guarantees in practice (as shown in Example 16 in the Appendix). We now generalize Algorithm 1 for any star query. At a high level, we will decompose thejoin query π x ,...,x k ( Q ∗ k ) into a union of k + 1 subqueries whose output is a partition of theresult of original query. These subqueries will be generated based on whether a value forsome x i is light or not. We will show if any of the values for x i is light, the enumerationdelay is small. The ( k + 1)-th subquery will contain heavy values for all attributes. Our keyidea again is to interleave the join computation of the heavy subquery with the remaininglight subqueries. Preprocessing Phase.

Assume all relations are reduced without dangling tuples, which canbe achieved in linear time [32]. The full join size | OUT ⋊⋉ | can also be computed in linear time.Similar to the preprocessing phase in the previous section, we construct the hash tables f, f − to perform the domain compression and modify all the input relations by replacing tuple t with f ( t ). Set ∆ = (2 · | D | k / | OUT ⋊⋉ | ) k − . For each relation R i , a value v for attribute x i is heavy if its degree (i.e | π y σ x i = v R ( x i , y ) | ) is greater than ∆, and light otherwise. Moreover, atuple t ∈ R i is identified as heavy or light depending on whether π x i ( t ) is heavy or light. Inthis way, each relation R is divided into two relations R h and R ℓ , containing heavy and lighttuples respectively in time O ( | D | ). The original query can be decomposed into subqueries ofthe following form: π x ,x , ··· ,x k ( R ?1 ⋊⋉ R ?2 ⋊⋉ · · · ⋊⋉ R ? k )where ? can be either h, ℓ or ⋆ . Here, R ⋆i simply denotes the original relation R i . However,care must be taken to generate the subqueries in a way so that there is no overlap betweenthe output of any subquery. In order to do so, we create k subqueries of the form Q i = π x ,...,x k ( R h ⋊⋉ · · · ⋊⋉ R hi − ⋊⋉ R ℓi ⋊⋉ R ⋆i +1 ⋊⋉ · · · ⋊⋉ R ⋆k ) Abusing the notation, f − ( B ) for some ordered list (or tuple) B returns an ordered list (tuple) B ′ where B ′ ( i ) = f − ( B ( i )). I C D T 2 0 2 1 :12 Enumeration Algorithms for Conjunctive Queries with Projection

In subquery Q i , relation R i has superscript ℓ , all relations R , . . . , R i − have superscript h and relations R i +1 , . . . , R k have superscript ⋆ . The ( k + 1)-th query with all ? as h isdenoted by Q H . Note that each output tuple t is generated by exactly one of the Q i andthus the output of all subqueries is disjoint. This implies that each f − ( t ) is also generatedby exactly one subquery. Similar to the preprocessing phase of two path query, we store all R ℓi and R hi in hashmaps where the values in the maps are lists sorted in lexicographic order. Enumeration Phase.

We next describe how enumeration is performed. The key idea isthe following: We will show that for Q L = Q ∪ · · · ∪ Q k , we can enumerate the result indelay O (∆). Since Q H contains all heavy valuations from all relations, we compute its join on-the-fly by alternating between some subquery in Q L and Q H . This will ensure that we cangive some output to the user with delay guarantees and also make progress on computing thefull join of Q H . Our goal is to reason about the running time of enumerating Q L (denotedby T L ) and the running time of Q H (denoted by T H ) and make sure that while we compute Q H , we do not run out of the output from Q L .Next, we introduce the algorithm that enumerates output for any specific valuation v of attribute x i , which is described in Lemma 9. This algorithm can be viewed as anotherinstantiation of Algorithm 3. ▶ Lemma 9.

Consider an arbitrary value v ∈ dom ( x i ) with degree d in relation R i ( x i , y ) .Then, its query result π x ,x , ··· ,x k σ x i = v R h ( x , y ) ⋊⋉ R h ( x , y ) ⋊⋉ · · · ⋊⋉ R i ( x i , y ) ⋊⋉ · · · ⋊⋉ R ⋆k ( x k , y ) can be enumerated with O ( d ) delay guarantee. Let c ⋆ be an upper bound on the number of operations in each iteration of ListMerge .This can be calculated by counting the number of operations in the exact implementationof the algorithm. Directly implied by Lemma 9, the result of any subquery in Q L can beenumerated with delay O (∆). Let Q ∗ H denote the corresponding full query of Q H , i.e, thehead of Q ∗ H also includes the variable y ( Q ∗ L is defined similarly). Then, Q ∗ H can be evaluatedin time T H ≤ c ⋆ · | Q ∗ H | ≤ c ⋆ · | OUT ⋊⋉ | / ListMerge on subquery Q H . This followsfrom the bound | Q ∗ H | ≤ | D | · ( | D | / ∆) k − and our choice of ∆ = (2 · | D | k / | OUT ⋊⋉ | ) k − . Since | Q ∗ H | + | Q ∗ L | = | OUT ⋊⋉ | , it holds that | Q ∗ L | ≥ | OUT ⋊⋉ | / T L is lower bounded by | Q ∗ L | (since we need at least one operation for every result).Thus, T L ≥ | OUT ⋊⋉ | / A is the full join computation of Q H and T = c ⋆ · | OUT ⋊⋉ | / A ′ is the enumeration algorithm applied to Q L with delay guarantee δ = O (∆) and T ′ = | OUT ⋊⋉ | / T and T ′ are fixed once | OUT ⋊⋉ | , ∆ and the constant c ⋆ are known.By construction, the outputs of Q H and Q L are also disjoint. Thus, the conditionsof Lemma 4 apply and we obtain a delay of O (∆). Theorem 7 obtains poor delay guarantees when the full join size | OUT ⋊⋉ | is close to input size | D | . In this section, we present an alternate algorithm that provides good delay guaranteesin this case. The algorithm is an instantiation of Lemma 5 on the star query, whichdegenerates to computing as many distinct output results as possible in limited preprocessingtime. An observation is that for each valuation u of attribute y , the cartesian product × i ∈{ , , ··· ,k } π x i σ y = u R i ( x i , y ) is a subset of output results without duplication. Thus, thissubset of output result is readily available since no deduplication needs to be performed. . Deep, X. Hu and P. Koutris 3:13 Similarly, after all relations are reduced, it is also guaranteed that each valuation of attribute x i of relation R i generates at least one output result. Thus, max ki =1 | dom ( x i ) | results arealso readily available that do not require deduplication. We define J as the larger of the twoquantities, i.e, J = max n max ki =1 | dom ( x i ) | , max u ∈ dom ( y ) Q ki =1 | σ y = u R i ( x i , y ) | o . Togetherwith these observations, we can achieve the following theorem. ▶ Theorem 10.

Consider star query π x ,...,x k ( Q ∗ k ) and an input database instance D . Thereexists an algorithm with preprocessing time O ( | D | ) and space O ( | D | ) , such that π x ,...,x k ( Q ∗ k ) can be enumerated with delay δ = O (cid:18) | OUT ⋊⋉ | / | OUT π | /k (cid:19) and space S e = O ( | D | )In the above theorem, we obtain delay guarantees that depend on both the full join result OUT ⋊⋉ and the projection output size OUT π .However, one does not need to know OUT ⋊⋉ or OUT π to apply the result. We first comparethe result with Theorem 7. First, observe that both Theorem 10 and Theorem 7 require O ( | D | ) preprocessing time. Second, the delay guarantee provided by Theorem 10 can bebetter than Theorem 7. This happens when | OUT ⋊⋉ | ≤ | D | · J − /k , a condition that can beeasily checked in linear time.We now proceed to describe the algorithm. First, we compute all the statistics forcomputing J in linear time. If J = | dom ( x j ) | for some integer j ∈ { , , · · · , k } , we justmaterialize one result for each valuation of x j . Otherwise, J = Q ki =1 | σ y = u R i ( x i , y ) | forsome valuation u in attribute y . Note that we do need to explicitly materialize the cartesianproduct but only need to store the tuples in S i ∈{ , , ··· ,k } σ y = u R i ( x i , y ). As mentionedbefore, each output in × ki =1 ( π y σ y = u R i ( x i , y )) can be enumerated with O (1) delay. Thispreprocessing phase takes O ( | D | ) time and O ( | D | ) space. We can now invoke Lemma 5 toachieve the claimed delay. The final observation is to express J in terms of | OUT π | . Note that | OUT π | ≤ Π i ∈ [ k ] | dom ( x i ) | which implies that max i ∈ [ k ] | dom ( x i ) | ≥ | OUT π | /k . Thus, it holdsthat J ≥ | OUT π | /k which gives us the desired bound on the delay guarantee. Both Theorem 7 and Theorem 3 are combinatorial algorithms. In this section, we will showhow fast matrix multiplication can be used to obtain a tradeoff between preprocessing timeand delay that is better than Theorem 3 for some values of delay. ▶ Theorem 11.

Consider the star query π x ,...,x k ( Q ∗ k ) and an input database instance D .Then, there exists an algorithm that requires preprocessing T p = O (( | D | /δ ) ω + k − ) and canenumerate the query result with delay O ( δ ) for ≤ δ ≤ | D | ( ω + k − / ( ω +2 · k − . For the two-path query and the current best value of ω = 2 . T p = O (( | D | /δ ) . ) and a delay guarantee of O ( δ ) for | D | . < δ ≤ | D | . . If we choose δ = | D | . , the preprocessing time is T p = O ( | D | . ). In contrast, Theorem 3 requires apreprocessing time of T p = O ( | D | . ), which is suboptimal compared to the above theorem.On the other hand, since T p = O ( | D | . ), we can safely assume that | OUT ⋊⋉ | > | D | . ,otherwise one can simply compute the full join in time c ⋆ · | D | . using ListMerge ,deduplicate and get constant delay enumeration. Applying Theorem 7 with | OUT ⋊⋉ | > | D | . tells us that we can obtain delay as O ( | D | / | OUT ⋊⋉ | ) = O ( | D | . ). Thus, we can offer theuser both choices and the user can decide which enumeration algorithm to use. I C D T 2 0 2 1 :14 Enumeration Algorithms for Conjunctive Queries with Projection

In this section, we will apply our techniques to another subset of hierarchical queries, whichwe call left-deep . A left-deep hierarchical query is of the following form: Q k leftdeep = R ( w , x ) ⋊⋉ R ( w , x , x ) ⋊⋉ . . . ⋊⋉ R k − ( w k − , x , . . . , x k − ) ⋊⋉ R k ( w k , x , . . . , x k )It is easy to see that Q k leftdeep is a hierarchical query for any k ≥

1. Note that for k = 2,we get the two-path query. For k = 3, we get R ( w , x ) ⋊⋉ S ( w , x , x ) ⋊⋉ T ( w , x , x ). Wewill be interested in computing the query π w ,...,w k ( Q k leftdeep ), where we project out all thejoin variables. We show that the following result holds: ▶ Theorem 12.

Consider the query π w ,...,w k ( Q k leftdeep ) and any input database D . Then,there exists an algorithm that enumerates the query after preprocessing time T p = O ( | D | ) with delay O ( | D | k / | OUT ⋊⋉ | ) . In the above theorem,

OUT ⋊⋉ is the full join result of the query Q k leftdeep without projections.The AGM exponent for Q k leftdeep is ρ ∗ = k . Observe that Theorem 12 is of interest when | OUT ⋊⋉ | > | D | k − to ensure that the delay is smaller than O ( | D | ). When the condition | OUT ⋊⋉ | > | D | k − holds, the delay obtained by Theorem 12 is also better than the one givenby the tradeoff in Theorem 3. In the worst-case when | OUT ⋊⋉ | = Θ( | D | k ), we can achieveconstant delay enumeration after linear preprocessing time, compared to Theorem 3 thatwould require Θ( | D | k ) preprocessing time to achieve the same delay. The decision of when toapply Theorem 12 or Theorem 3 can be made in linear time by checking whether | D | k / | OUT ⋊⋉ | is smaller or larger than the actual delay guarantee obtained by the algorithm of Theorem 3after linear time preprocessing. In this section, we will study path queries. In particular, we will present an algorithm thatenumerates the result of the query π x ,x k +1 ( P k ), i.e., the CQ that projects the two endpointsof a path query of length k . Recall that for k ≥ P k is not a hierarchical query, and hencethe tradeoff from [17] does not apply. A subset of path queries, namely 3-path and 4-pathcounting queries were considered in [16]. The algorithm used for counting the answers of3-path and 4-path queries under updates constructed a set of views that can be used for thetask of enumerating the query results under the static setting. Our result extends the sameidea to apply to arbitrary length path queries, which we state next. ▶ Theorem 13.

Consider the query π x ,x k +1 ( P k ) with k ≥ . For any input instance D andparameter ϵ ∈ [0 , there exists an algorithm that enumerates the query with preprocessingtime (and space) T p = O ( | D | − ϵ/ ( k − ) and delay O ( | D | ϵ ) . We should note here that for ϵ = 1, we can obtain a delay O ( | D | ) using only linearpreprocessing time O ( | D | ) using the result of [3] since the query is acyclic, while for ϵ → O ( | D | − / ( k − ). Hence, for k ≥

3, we observea discontinuity in the time-delay tradeoff. A second observation following from Theorem 13 isthat as k → ∞ , the tradeoff collapses to only two extremal points: one where we get constantdelay with T p = O ( | D | ), and the other where we get linear delay with T p = O ( | D | ). . Deep, X. Hu and P. Koutris 3:15 We overview prior work on static query evaluation for acyclic join-project queries. The resultof any acyclic conjunctive query can be enumerated with constant delay after linear-timepreprocessing if and only if it is free-connex [3]. This is based on the conjecture thatBoolean multiplication of n × n matrices cannot be done in O ( n ) time. Acyclicity itself isnecessary for having constant delay enumeration: A conjunctive query admits constant delayenumeration after linear-time preprocessing if and only if it is free-connex acyclic [6]. This isbased on a stronger hypothesis that the existence of a triangle in a hypergraph of n verticescannot be tested in time O ( n ) and that for any k , testing the presence of a k -dimensionaltetrahedron cannot be tested in linear time. We refer the reader to an overview of pre-2015for problems and progress related to constant delay enumeration [27]. Prior work also exhibitsa dependency between the space and enumeration delay for conjunctive queries with accesspatterns [10]. It constructs a succinct representation of the query result that allows forenumeration of tuples over some variables under value bindings for all other variables. Asnoted by [17], it does not support enumeration for queries with free variables, which is alsoits main contribution. Our work demonstrates that for a subset of hierarchical queries, thetradeoff shown in [17] is not optimal. Our work introduces fundamentally new ideas thatmay be useful in improving the tradeoff for arbitrary hierarchical queries and enumeration ofUCQs. There has also been some experimental work by the database community on problemsrelated to enumerating join-project query results efficiently but without any formal delayguarantees. Seminal work [30, 29, 31, 11] has studied how compressed representations canbe created apriori that allow for faster enumeration of query results. For the two path query,the fastest evaluation algorithm (with no delay guarantees) evaluates the projection joinoutput in time O ( | D | · | OUT π | ( ω − ω +1) + | D | ω − ω +1) · | OUT π | ω +1) ) [9, 2]. For star queries, thereis no closed form expression but fast matrix multiplication can be used to obtain instancedependent bounds on running time. Also related is the problem of dynamic evaluation ofhierarchical queries. Recent work [16, 17, 4, 5] has studied the tradeoff between amortizedupdate time and delay guarantees. Some of our techniques may also lead to new insights andimprovements in existing algorithms. Prior work in differential privacy [25] and DGM [7]may also benefit from some of our techniques. In this paper, we studied the problem of enumerating query results for an important subset ofCQs with projections, namely star and path queries. We presented data-dependent algorithmsthat improve upon existing results by achieving non-trivial delay guarantees in linear prepro-cessing time. Our results are based on the idea of interleaving join query computation toachieve meaningful delay guarantees. Further, we showed how non-combinatorial algorithms(fast matrix multiplication) can be used for faster preprocessing to improve the tradeoffbetween preprocessing time and delay. We also presented new results on time-delay tradeoffsfor a subset of non-hierarchical queries for the class of path queries. Our results also openseveral new tantalizing questions that open up possible directions for future work.

More preprocessing time for star queries.

The second major open question isto show whether Theorem 7 can benefit from more preprocessing time to achieve lowerdelay guarantees. For instance, if we can afford the algorithm preprocessing time T p = O ( | OUT ⋊⋉ | / | D | ϵ + | D | ) time, can we expect to get delay δ = O ( | D | ϵ ) for all ϵ ∈ (0 , Sublinear delay guarantees for two-path query.

It is not known whether we can

I C D T 2 0 2 1 :16 Enumeration Algorithms for Conjunctive Queries with Projection achieve sublinear delay guarantee in linear preprocessing time for Q two-path query. Thisquestion is equivalent to the following problem: for what values of | OUT π | can Q path beevaluated in linear time. If | OUT π | = | D | ϵ , then the best known algorithms can evaluate Q two-path in time O ( | D | ϵ/ ) (using fast matrix multiplication) [9] but this is still superlinear. Space-delay bounds.

The last question is to study the tradeoff between space vs delayfor arbitrary hierarchical queries and path queries. Using some of our techniques, it maybe possible to smartly materialize a certain subset of joins that could be used to achievedelay guarantees by interleaving with join computation. We also believe that the space-delaytradeoff implied by prior work can also be improved for certain ranges of delay by using theideas introduced in this paper. . Deep, X. Hu and P. Koutris 3:17

References M. Abo Khamis, H. Q. Ngo, and D. Suciu. What do shannon-type inequalities, submodularwidth, and disjunctive datalog have to do with one another? In

Proceedings of the 36th ACMSIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems , pages 429–444,2017. R. R. Amossen and R. Pagh. Faster join-projects and sparse matrix multiplications. In

Proceedings of the 12th International Conference on Database Theory , pages 121–126. ACM,2009. G. Bagan, A. Durand, and E. Grandjean. On acyclic conjunctive queries and constant delayenumeration. In

International Workshop on Computer Science Logic , pages 208–222. Springer,2007. C. Berkholz, J. Keppeler, and N. Schweikardt. Answering conjunctive queries under updates. In proceedings of the 36th ACM SIGMOD-SIGACT-SIGAI symposium on Principles of databasesystems , pages 303–318. ACM, 2017. C. Berkholz, J. Keppeler, and N. Schweikardt. Answering fo+ mod queries under updates onbounded degree databases.

ACM Transactions on Database Systems (TODS) , 43(2):7, 2018. J. Brault-Baron.

De la pertinence de l’énumération: complexité en logiques propositionnelle etdu premier ordre . PhD thesis, Université de Caen, 2013. A. R. Chowdhury, T. Rekatsinas, and S. Jha. Data-dependent differentially private parameterlearning for directed graphical models. In

International Conference on Machine Learning ,pages 1939–1951. PMLR, 2020. T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein.

Chapter 8.2, Introduction toalgorithms . MIT press, 2009. S. Deep, X. Hu, and P. Koutris. Fast join project query evaluation using matrix multiplication.In

Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data ,pages 1213–1223, 2020. S. Deep and P. Koutris. Compressed representations of conjunctive query results. In

Proceedingsof the 35th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems ,pages 307–322. ACM, 2018. S. Deep and P. Koutris. Ranked enumeration of conjunctive query results.

To appear in Joint2021 EDBT/ICDT Conferences, ICDT ’21 Proceedings , 2021. F. L. Gall and F. Urrutia. Improved rectangular matrix multiplication using powers ofthe coppersmith-winograd tensor. In

Proceedings of the Twenty-Ninth Annual ACM-SIAMSymposium on Discrete Algorithms , pages 1029–1046. SIAM, 2018. G. Gottlob, G. Greco, and F. Scarcello. Treewidth and hypertree width.

Tractability: PracticalApproaches to Hard Problems , 1, 2014. G. Greco and F. Scarcello. Structural tractability of enumerating csp solutions.

Constraints ,18(1):38–74, 2013. J. E. Hopcroft, J. D. Ullman, and A. Aho. The design and analysis of computer algorithms,1975. A. Kara, H. Q. Ngo, M. Nikolic, D. Olteanu, and H. Zhang. Counting triangles under updatesin worst-case optimal time. In , 2019. A. Kara, M. Nikolic, D. Olteanu, and H. Zhang. Trade-offs in static and dynamic evaluationof hierarchical queries. In

Proceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposiumon Principles of Database Systems , pages 375–392, 2020. W. Kazana.

Query evaluation with constant delay . PhD thesis, 2013. H. Q. Ngo, E. Porat, C. Ré, and A. Rudra. Worst-case optimal join algorithms. In

Proceedingsof the 31st ACM SIGMOD-SIGACT-SIGAI symposium on Principles of Database Systems ,pages 37–48. ACM, 2012. H. Q. Ngo, C. Ré, and A. Rudra. Skew strikes back: new developments in the theory of joinalgorithms.

SIGMOD Record , 42(4):5–16, 2013. D. Olteanu and M. Schleich. Factorized databases.

ACM SIGMOD Record , 45(2):5–16, 2016.

I C D T 2 0 2 1 :18 Enumeration Algorithms for Conjunctive Queries with Projection D. Olteanu and J. Závodn`y. Size bounds for factorised representations of query results.

ACMTransactions on Database Systems (TODS) , 40(1):1–44, 2015. M. H. Overmars and J. Van Leeuwen. Dynamization of decomposable searching problemsyielding good worst-case bounds. In

Theoretical Computer Science , pages 224–233. Springer,1981. A. Pagh and R. Pagh. Scalable computation of acyclic joins. In

Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems , pages225–232, 2006. A. Roy Chowdhury, C. Wang, X. He, A. Machanavajjhala, and S. Jha. Crypt?: Crypto-assisted differential privacy on untrusted servers. In

Proceedings of the 2020 ACM SIGMODInternational Conference on Management of Data , pages 603–619, 2020. L. Segoufin. Constant delay enumeration for conjunctive queries.

SIGMOD Record , 44(1):10–17,2015. L. Segoufin. Constant delay enumeration for conjunctive queries.

ACM SIGMOD Record ,44(1):10–17, 2015. D. Suciu, D. Olteanu, C. Ré, and C. Koch. Probabilistic databases, synthesis lectures on datamanagement.

Morgan & Claypool , 2011. K. Xirogiannopoulos and A. Deshpande. Extracting and analyzing hidden graphs fromrelational databases. In

Proceedings of the 2017 ACM International Conference on Managementof Data , pages 897–912. ACM, 2017. K. Xirogiannopoulos, U. Khurana, and A. Deshpande. Graphgen: Exploring interesting graphsin relational data.

Proceedings of the VLDB Endowment , 8(12):2032–2035, 2015. K. Xirogiannopoulos, V. Srinivas, and A. Deshpande. Graphgen: Adaptive graph processingusing relational databases. In

Proceedings of the Fifth International Workshop on GraphData-management Experiences & Systems , GRADES’17, pages 9:1–9:7, New York, NY, USA,2017. ACM. M. Yannakakis. Algorithms for acyclic database schemes. In

VLDB , volume 81, pages 82–94,1981. . Deep, X. Hu and P. Koutris 3:19

A Algorithm for Lemma 5

Algorithm 2 describes the detailed algorithm for Lemma 5.

Algorithm 2

Deduplicate ( J, A ) Input :

Materialized output list J , Algorithm A with known completion time T Output :

Deduplicated result of A δ ← O ( T /J ) , ptr ← , dedup ← H ← ∅ /* empty hash-set */ while ptr < | J | do output J [ ptr ] /* output result from J to maintain delay guarantee */ ptr ← ptr + 1 , counter ← while counter ≤ δ do if A has not completed then Execute A for c time /* c is a constant */ foreach t ∈ t /* let t be the output tuples generated (if any) */ do if t ̸∈ J and t ̸∈ H then output t insert t in H counter ← counter + c ; ▶ Lemma 5.

Let δ be a parameter to be fixed later. We first store the J output results in a hashset and create an empty hash set H that will be used for deduplication. Using a similarinterleaving strategy as above, we emit one result from J and allow algorithm A to run for δ time. Whenever A wants to emit an output tuple, it probes the hash set H and J , emits t only if t does not appear in H and J , followed by inserting t in H . Inserting t in H willensure that A does not output duplicates . Each probe takes O (1) time, so the total runningtime of A is O ( T ). Our goal is to choose δ such that A terminates before the materializedoutput J runs out. This condition is satisfied when δ · J ≥ O ( T ) which gives us δ = O ( T /J ).It can be easily checked that no duplicated result is emitted and O ( δ ) delay is guaranteedbetween every pair of consecutive results. Again, observe that we need the algorithm A tobe pausable, which means that we should be able to resume the execution from where we leftoff. This can be achieved by storing the contents of all registers in the memory and loadingit when required to resume execution. ◀ B Other Missing Proofs ▶ Lemma 4.

Consider two algorithms A and A ′ such that A enumerates query results in total time at most T with no delay guarantees. If A guarantees that it will generate results with no duplicates, then there is no need to use H . I C D T 2 0 2 1 :20 Enumeration Algorithms for Conjunctive Queries with Projection A ′ enumerates query results with delay δ and runs in total time at least T ′ . The outputs of A and A ′ are disjoint. T and T ′ are provided as input to the algorithm.Then, the union of the outputs of A and A ′ can be enumerated with delay c · δ · max { , T /T ′ } for some constant c . Proof.

Let η and γ denote two positive values to be fixed upon later. Note that in every δ time, we can emit one output result from A ′ . But since we also want to compute the outputfrom A that takes overall time T , we need to slow down the enumeration of A ′ sufficientlyso that we do not run out of output from A ′ . This can be done by interleaving the twoalgorithms in the following way: we run A ′ for γ operations, pause A ′ , then run A for η operations, pause A and resume A ′ for γ operations, and so on. The pause and resume takesconstant time (say c pause and c resume ) in RAM model where the state of registers and programcounter can be stored and retrieved enabling pause and resume of any algorithm. Our goalis to find a value of η such that A ′ does not terminate until A has finished. This condition issatisfied when the number of iterations of A ′ is larger than number of iterations of A . Thisgives us the condition that, T ′ /γ ≤ (Time taken by A ′ ) /γ = (Time taken by A ) /η ≤ T /η

Thus, any value of η ≤ T · γ/T ′ is acceptable. We fix η to be any positive constantand then set γ to be the smallest positive value that satisfies the condition. The delayis bounded by the product of worst-case number of iterations between two answers of A ′ and the work done between each iteration which is ( δ/γ ) · ( γ + η + c pause + c resume ) ≤ δ · (1 + T /T ′ + ( c pause + c resume ) /γ ) = O ( δ · max { , T /T ′ } ). ◀ Algorithm 3

Merge ( A , A , · · · , A m ) S ← { , , · · · , m } ; foreach i ∈ S do e i ← A i .f irst () ; while S ̸ = ∅ do w ← min i ∈ S e i ; /* finds the smallest output (using ⪯ ) over all algorithms */ output w ; foreach i ∈ S do if e i = w then e i ← A i .next () ; if e i = null then S ← S − { i } /* the algorithm completes its output */ ▶ Lemma 6.

We describe Algorithm 3. For simplicity of exposition, we assume that A i outputs anull value when it finishes enumeration. Note that results enumerated by one algorithm arein order, thus it always outputs the locally minimum result ( e i ) over the remaining resultto be enumerated. Algorithm 3 goes over all locally minimum results over all algorithmsand outputs the smallest one (denoted w ) as globally minimum result (line 5). Once a result . Deep, X. Hu and P. Koutris 3:21 is enumerated, each A i needs to check whether its e i matches w . If yes, then A i needs toupdate its locally minimum result by finding the next one. Then, Algorithm 3 just repeatsthis loop until all algorithms finish enumeration.Observe that one distinct result is enumerated in each iteration of the while loop. It takes O ( m ) time to find the globally minimum result and O ( m · δ ) to update all local minimumresults (line 7-line 9). Thus, Algorithm 3 has a delay guarantee of O ( m · δ ). ◀▶ Example 14.

Continuing our discussion from Subsection 4.1, we now construct an instancewhere achieving constant delay with Theorem 3 would require close to Θ( | D | ) computation.Let us fix N to be a power of 2. We will fix | dom ( x ) | = | dom ( z ) | = N log N . Let D i be the database constructed by setting N α = 2 i for i ∈ { , , . . . , log N } where relation R is the cross product of N α x -values and N − α y -values, and S is the cross product of N α z -values and N − α y -values.. We also construct a database D ∗ which consists of asingle y that is connected to all x and z values. Let D = D ∗ ∪ D ∪ D ∪ · · · ∪ D log N .It is now easy to see that | D | = N · log N, | dom ( y ) | = P α N − α ≤ N = Θ( | D | / log | D | )and | OUT ⋊⋉ | = P α N α + N log N = Θ( | D | ). On this instance, Theorem 3 achievesΘ( | D | / log | D | ) after linear time preprocessing. Suppose we wish to achieve constant delayenumeration. Let us fix this constant to be c ∗ (which is also a power of 2, for simplicity). Then,we need enough preprocessing time to materialize the join result of all database instances D i where i ∈ { , , . . . , log( N/c ∗ ) } to ensure that the number of heavy y values that remain isat most c ∗ . This requires time T p > P i ∈{ , ,..., log( N/c ∗ ) } N · i > N /c ∗ = Θ( | D | / log | D | ).This example shows that Theorem 3 requires near quadratic computation to achieve constantdelay enumeration. ▶ Lemma 8.

For the query Q two-path and an instance D , we can enumerate Q two-path ( D ) with delay δ = O ( | D | / | OUT ⋊⋉ | ) and S e = O ( | D | ) . Proof.

To prove this result, we will apply Lemma 4, where A ′ is the first loop (the one withlight-degree values), and A is the second loop (the one with high-degree values).Let δ denote the degree of the valuation v i ∗ . First, we claim that the delay of A ′ will be O ( δ ). Indeed, ListMerge will output a result every O ( δ ) time since the degreeof each valuation in the first loop is at most δ . Let J h = P i>i ∗ | R ( v i , y ) ⋊⋉ S ( y, z ) | and J ℓ = P i ≤ i ∗ | R ( v i , y ) ⋊⋉ S ( y, z ) | . Then, A ′ runs in time at least J ℓ , and A in time at most c ⋆ · J h . Here, c ⋆ is an upper bound on the number operations in each iteration of the loopin Algorithm 1. Since by construction J ℓ ≥ J h , Lemma 4 obtains a total delay of O ( δ ).It now remains to bound δ . First, observe that, since i ∗ is the smallest index thatsatisfies Equation 1, it must be that J ℓ − J h ≤ | D | (if not, shifting the smallest index by onedecreases the LHS by at most | D | and increases the RHS by at most | D | while still satisfyingthe condition that J ℓ ≥ J h ). Combined with the observation that J ℓ + J h = | OUT ⋊⋉ | , we getthat J h ≥ | OUT ⋊⋉ | / − | D | / ≥ / · | OUT ⋊⋉ | assuming | OUT ⋊⋉ | ≥ · | D | . The final observationis that J h ≤ | D | /δ since there are most | D | /δ heavy values, and each heavy value can joinwith at most | D | tuples for the full join. Combining the two inequalities gives us the claimeddelay guarantee. ◀▶ Lemma 9.

Consider some tuple ( v i , u ) ∈ R i . Each u is associated with a list of valuationsover attributes ( x , · · · , x i − , x i +1 , · · · , x k ), which is a cartesian product of k − I C D T 2 0 2 1 :22 Enumeration Algorithms for Conjunctive Queries with Projection

X Y a b a b a b a b (a) Table R Y Z b c b c b c b c (b) Table S a b a b b [ c , c ] S [ b ][ c ] S [ b ][ c ] S [ b ] ↓↓↓↓ (c) output ( a , c ) b a b b [ c , c ][ c ][ c ] ↓↓↓ (d) output ( a , c ) b a b b [ c , c ][ c ][ c ] ↓↓↓ (e) output( a , c ) Figure 6

Example for two path query enumeration σ y = u R j ( x j , y ). Note that such a list is not materialized as that for two-path query, butpresent in a factorized form.We next define the enumeration algorithm A u for each u ∈ π y σ x i = v R i ( x i , y ), withlexicographical ordering of attributes ( x , · · · , x i − , x i +1 , · · · , x k ). Note that elements ineach list π x j σ y = u R ? j ( x j , y ) can be enumerated with O (1) delay. Then, A u enumerates allresults in × j ̸ = i : j ∈{ , ··· ,k } σ y = u R ? j ( x j , y ) by k − x , · · · , x i − , x i +1 , · · · , x k ), which has O ( k −

1) = O (1) delay. After applying Algorithm 3,we can obtain an enumeration algorithm that enumerates the union of query results over allneighbors with O ( d ) delay guarantee. For each output tuple t generated by Algorithm 3, wereturn f − ( t ) to the user. ◀▶ Example 15.

Consider relations R and S as shown in Figure 6a and Figure 6b. Figure 6cshows the sorted valuations a and a by their degree and the valuations for Z as sorted lists S [ b ] , S [ b ] and S [ b ]. For both a and a , the pointers point to the head of the lists. Wewill now show how ListMerge ( S [ b ] , S [ b ] , S [ b ]) is executed for a . Since there are threesorted lists that need to be merged, the algorithm finds the smallest valuation across thethree lists. c is the smallest valuation and the algorithm outputs ( a , c ). Then, we needto increment pointers of all lists which are pointing to c ( S [ b ] is the only list containing c ). Figure 6d shows the state of pointers after this step. The pointer for S [ b ] points to c and all other pointers are still pointing to the head of the lists. Next, we continue the listmerging by again finding the smallest valuation from each list. Both S [ b ] and S [ b ] pointersare pointing to c and the algorithm outputs ( a , c ). The pointers for both S [ b ] and S [ b ]are incremented and the enumeration for both the lists is complete as shown in Figure 6e. Inthe last step, only S [ b ] list remains and we output ( a , c ) and increment the pointer for S [ b ]. All pointers are now past the end of the lists and the enumeration is now complete. ▶ Example 16.

Consider a relation R ( x, y ) of size O ( N ) that contains values v , . . . , v N for attribute x . Suppose that each of v , . . . , v N − have degree exactly 1, and each one isconnected to a unique value of y . Also, v N has degree N − N − y . Suppose we want to compute Q two-path . It is easy to see that OUT ⋊⋉ = Θ( N ).Thus, applying the bound of δ = O ( N / | OUT ⋊⋉ | ) gives us O ( N ) delay. However, Algorithm 1will achieve a delay guarantee of O (1). This is because all of v , . . . , v N − are processedby the left pointer in O (1) delay as they produce exactly one output result, while the right pointer processes v N on-the-fly in O ( N ) time. ▶ Theorem 11.

We sketch the proof for k = 2. Let δ be the degree threshold for deciding whether avaluation is heavy or light. We can partition the original query into the following subqueries: π x,z ( R ( x ? , y ? ) ⋊⋉ R ( y ? , z ? )) where ? can be either h, ℓ or ⋆ . The input tuples can also bepartitioned into four different cases (which can be done in linear time since δ is fixed). Wehandle each subquery separately. x has ? = ℓ , y has ? = ⋆ and z has ? = ⋆ . In this case, we can just invoke ListMerge ( v i )for each valuation v i of attribute x and enumerate the output. x has ? = h , y has ? = ⋆ and z has ? = ℓ . In this case, we can invoke ListMerge ( v i ) foreach valuation v i of attribute z and enumerate the output. Note that there is no overlapof output between this case and the previous case.both x, z have ? = h . We compute the output of π x,z R ( x h , y ? ) ⋊⋉ S ( y ? , z h ) in preprocessingphase and obtain O (1)-delay enumeration. In the following, we say that y has ? = ℓ tomean that the join considers all y valuations that have degree at most δ in both R and S . y has ? = ℓ . We compute the full join R ( x h , y ℓ ) ⋊⋉ S ( y ℓ , z h ) and materialize all distinctoutput results, which takes O ( | D | · δ ) time. y has ? = h . There are at most | D | /δ valuations in all attributes. We now havea square matrix multiplication instance where all dimensions have size O ( | D | /δ ).Using Lemma 2, we can evaluate the join in time O (( | D | /δ ) ω ).Overall, the preprocessing time is T p = O (( | D | /δ ) ω + | D | · δ ). The matrix multiplicationterm dominates whenever δ ≤ O ( | D | ( ω − / ( ω +1) ) which gives us the desired time-delaytradeoff. ◀▶ Theorem 12.

Once again, we will use the same steps of the preprocessing phase as in Lemma 8. Weindex all the input relations in a hash table where the values are sorted lists after applyingthe domain compression trick using f and f − . Thus, count sort now runs in O ( | D | ) time.We also compute | OUT ⋊⋉ | using Yannakakis algorithm.The algorithm is based on ListMerge subroutine from Lemma 6. We distinguish twocases based on the degree of valuations of variable w k . If some valuation of w k (say v ) islight (degree is at most δ ), then we can enumerate the join result with delay O ( δ ). Sincethere are at most δ tuples U = σ w k = v R k , each u ∈ U is associated with a list of valuationsover attributes ( w , w , . . . , w k − ), which is a cartesian product of k − π w i σ x = u [ x ] ,...,x i = u [ x i ] R i . The elements of each list can be enumerated in O (1) delay inlexicographic order. Thus, we only need to merge the δ sublists which can be accomplishedin O ( δ ) time using Lemma 6. Let T L denote the total time required to enumerate the queryresult for all light w k valuations.We now describe how to process all w k valuations that are heavy. The key observationhere is that the full-join result with no projections for this case can be upper bounded by | D | k /δ since there are at most | D | /δ heavy w k valuations. The full-join result of the heavysubquery can be done in time T H ≤ c ⋆ · | D | k /δ using ListMerge . Fixing δ = 2 · | D | k / | OUT ⋊⋉ | gives us T H ≤ c ⋆ · | OUT ⋊⋉ | /

2. Since | Q ∗ L | + | Q ∗ H | = | OUT ⋊⋉ | , our choice of δ ensures that T L ≥ | Q ∗ L | ≥ | OUT ⋊⋉ | / i ) A ′ is the list-merging algorithm for the light casewith T ′ = | OUT ⋊⋉ | /

2; ( ii ) A is the worst-case optimal join algorithm for the heavy case with T = c ⋆ · | OUT ⋊⋉ | /

2; ( iii ) T, T ′ are fixed once | OUT ⋊⋉ | , δ, c ⋆ have been computed. I C D T 2 0 2 1 :24 Enumeration Algorithms for Conjunctive Queries with Projection

Once again, in order to know the exact values of

T, T ′ , we need to analyze the exactconstant that is used in the join algorithm for ListMerge . By construction, the output of A and A ′ is different. Note that for each output tuple t generated, we return f − ( t ) to thethe user, a constant time operation. ◀▶ Theorem 13.

Let ∆ be a parameter that we will fix later. In the preprocessing phase, we firstperform a full reducer pass to remove dangling tuples, apply the domain transformationtechnique by creating f and f − and then create a hash map for each relation R i ( x i , x i +1 )with key x i , and all its corresponding x i +1 values sorted for each key entry. (We also storethe degree of each value.) Next, for every i = 1 , . . . , k , and every heavy value a of x i in R i (with degree > ∆), we compute the query π x k +1 ( R i ( a, x i +1 ) ⋊⋉ · · · ⋊⋉ R k ( x k , x k +1 )), andstore its result sorted in a hash map with key a . Note that each such query can be computedin time O ( | D | ) through a sequence of semijoins and projections, and sorting in linear timeusing count sort. Since there are at most | D | / ∆ heavy values for each x i , the total runningtime (and space necessary) for this step is O ( | D | / ∆).We will present the enumeration algorithm using induction. In particular, we will showthat for each i = k, . . . , a of x i , the subquery π x k +1 ( R i ( a, x i +1 ) ⋊⋉ · · · ⋊⋉ R k ( x k , x k +1 )) can be enumerated (using the same order) with delay O (∆ k − i ). Thisimplies that our target path query can be enumerated with delay O (∆ k − ), by simplyiterating through all values of x in R . Finally we can obtain the desired result by choosing∆ = | D | ϵ/ ( k − .Indeed, for the base case ( i = k ) it is trivial to see that we can enumerate π x k +1 ( R k ( a, x k +1 ))in constant time O (1) using the stored hash map. For the inductive step, consider some i ,and a value a for x i in R i . If the value a is heavy, then we can enumerate all the x k +1 ’s withconstant delay by probing the hash map we computed during the preprocessing phase. If thevalue is light, then there are at most ∆ values of x i +1 . For each such value b , the inductivestep provides an algorithm that enumerates all x k +1 with delay O (∆ k − i − ). Observe thatthe order across all b ’s will be the same. Thus, we can apply Lemma 6 to obtain that we canenumerate the union of the results with delay O (∆ · ∆ k − i − ) = O (∆ k − i ). Finally, For eachoutput tuple t generated, we return f − ( t ) to the the user.) to the the user.