[PDF] Bitvector-aware Query Optimization for Decision Support Queries (extended version)

Abstract

Bitvector filtering is an important query processing technique that can significantly reduce the cost of execution, especially for complex decision support queries with multiple joins. Despite its wide application, however, its implication to query optimization is not well understood. In this work, we study how bitvector filters impact query optimization. We show that incorporating bitvector filters into query optimization straightforwardly can increase the plan space complexity by an exponential factor in the number of relations in the query. We analyze the plans with bitvector filters for star and snowflake queries in the plan space of right deep trees without cross products. Surprisingly, with some simplifying assumptions, we prove that, the plan of the minimal cost with bitvector filters can be found from a linear number of plans in the number of relations in the query. This greatly reduces the plan space complexity for such queries from exponential to linear. Motivated by our analysis, we propose an algorithm that accounts for the impact of bitvector filters in query optimization. Our algorithm optimizes the join order for an arbitrary decision support query by choosing from a linear number of candidate plans in the number of relations in the query. We implement our algorithm in Microsoft SQL Server as a transformation rule. Our evaluation on both industry standard benchmarks and customer workload shows that, compared with the original Microsoft SQL Server, our technique reduces the total CPU execution time by 22%-64% for the workloads, with up to two orders of magnitude reduction in CPU execution time for individual queries.

Full PDF

BBitvector-aware Query Optimization for DecisionSupport Queries (extended version)*

Bailu Ding

Microsoft [email protected]

Surajit Chaudhuri

Microsoft [email protected]

Vivek Narasayya

Microsoft [email protected]

ABSTRACT

Bitvector filtering is an important query processing tech-nique that can significantly reduce the cost of execution, es-pecially for complex decision support queries with multiplejoins. Despite its wide application, however, its implicationto query optimization is not well understood.In this work, we study how bitvector filters impact queryoptimization. We show that incorporating bitvector filtersinto query optimization straightforwardly can increase theplan space complexity by an exponential factor in the numberof relations in the query. We analyze the plans with bitvectorfilters for star and snowflake queries in the plan space ofright deep trees without cross products. Surprisingly, withsome simplifying assumptions, we prove that, the plan ofthe minimal cost with bitvector filters can be found froma linear number of plans in the number of relations in thequery. This greatly reduces the plan space complexity forsuch queries from exponential to linear.Motivated by our analysis, we propose an algorithm thataccounts for the impact of bitvector filters in query optimiza-tion. Our algorithm optimizes the join order for an arbitrarydecision support query by choosing from a linear numberof candidate plans in the number of relations in the query.We implement our algorithm in Microsoft SQL Server as atransformation rule. Our evaluation on both industry stan-dard benchmarks and customer workload shows that, com-pared with the original Microsoft SQL Server, our techniquereduces the total CPU execution time by 22%-64% for theworkloads, with up to two orders of magnitude reduction inCPU execution time for individual queries.

CCS CONCEPTS • Information systems → Query optimization ; Queryplanning . KEYWORDS database; query optimization; query processing; bitvectorfilter; Bloom filter; join order enumeration * This technical report is an extended version of the ACM SIGMOD 2020paper

Bitvector-aware Query Optimization for Decision Support Queries [14]. B AC DB AC D (a) Join graph B AHJ HJ HJ CD B AHJ HJ HJ CD (b) Plan with bitvector filters Figure 1: Example of pushing down bitvector filtersfor a query plan joining relations A , B , C , D Bitvector filters, including bitmap or hash filter [6, 7, 18],Bloom filter and its variants [2, 7, 15, 24, 32], perform ’prob-abilistic’ semi-join reductions to effectively prune out rowsthat will not qualify join conditions early in the query execu-tion pipeline. Because they are easy to implement and lowin overhead, bitvector filters are widely used in commercialdatabases [13, 17, 21, 23].Prior work on using bitvector filters has heavily focused onoptimizing its effectiveness and applicability for query pro-cessing. One line of prior work has explored different sched-ules of bitvector filters for various types of query plan trees tooptimize its effect on query execution [10–12]. Many variantsof bitvector filters have also been studied that explore thetrade-off between the space and accuracy [2, 7, 9, 15, 24, 32].In query processing, bitvector filters are mostly used inhash joins [10–12]. Specifically, the commercial databaseDBMS-X implements the bitvector filter scheduling algo-rithm following [18] (Section 2). At a high level, a single bitvector filter is created with the equi-join columns at ahash join operator and is pushed down to the lowest possiblelevel of the subplan rooted at the probe side. Figure 1 showsan example of applying bitvector filters to a query plan .Figure 1a shows the join graph of the query and Figure 1bshows its query plan, where the arrow in Figure 1b pointsfrom the operator that creates the bitvector filter to the oper-ator where the bitvector filter is pushed down to. As shownin Figure 1b, a bitvector filter is created from the build sideof each hash join operator (

H J , H J , and H J ). Since C onlyjoins with B , the bitvector filter created from H J bypasses H J and is pushed down to B . Similarly, because D joins with a r X i v : . [ c s . D B ] M a y oth A and C , the bitvector filter created from H J consists ofcolumns from both A and C . Thus, the lowest possible levelto push down this bitvector filter is H J . Bitvector filters canalso be adapted for merge joins.Surprisingly, despite the wide application of and decadesof research on bitvector filters for query processing, theimpact of bitvector filters on query optimization is not wellunderstood. To the best of our knowledge, most state-of-the-art DBMSs add bitvector filters to the query plans producedby the query optimizer as a post-processing step.Neglecting the impact of bitvector filters in query optimiza-tion can miss out opportunities of significant plan qualityimprovement. Figure 2 shows an example of such an oppor-tunity with a query using the JOB [25] benchmark schema: SELECT COUNT(*)FROM movie_keyword mk, title t, keyword kWHERE mk.movie_id = t.id AND mk.keyword_id = k.idAND t.title LIKE '%(' AND k.keyword LIKE '%ge%'

Figure 2a shows the join graph of the query, where eachedge is annotated with the join cardinality of the correspond-ing tables. Figure 2b shows the best query plan P withoutusing bitvector filters. Each operator is annotated with thenumber of tuples after filter predicates being applied and theoperator cost ( tuple / cost ).Figure 2c shows the query plan after adding bitvectorfilters to P as a post-processing step. Although the cost of P is reduced after adding bitvector filters, it still costs 3 × asmuch as the best plan when the impact of bitvector filters isconsidered during query optimization (Figure 2d).Because P is more expensive than P without using bitvec-tor filters (Figure 2e), the optimizer will choose P as the bestplan if it neglects the impact of bitvector filters during queryoptimization. Therefore, the optimizer will choose a muchworse plan (Figure 2c) if the bitvector filters are only consid-ered as a post-processing step after query optimization.Incorporating bitvector filters into query optimizationis surprisingly challenging. Existing top-down or bottom-up dynamic programming (DP) based query optimizationframework cannot directly integrate the bitvector filters intoits optimization, because the effect of bitvector filters canviolate the substructure optimality property in DP. In a DP-based query optimization framework, either top-down orbottom-up, an optimal subplan is stored for each subset A ofrelations involved in a query. With bitvector filters, however,in addition to the relations in A , the optimal subplan alsodepends on what bitvector filters are pushed down to A andhow these bitvector filters apply to the relations in A basedon the structure of the subplan. For example, Figure 2c andFigure 2d both contain a subplan of joining { mk , t }. The costof the two subplans, however, is more than 3 × different dueto the different bitvector filters pushed down to the subplan. Incorporating bitvector filters into query optimizationstraightforwardly can be expensive. Similar to supportinginteresting orders in query optimization [34], the numberof optimal substructures can increase by an exponential fac-tor in the number of relations to account for the impact ofvarious combinations of bitvector filters.Surprisingly, prior work has shown that, under limitedconditions, different join orders results in similar executioncost when bitvector filters are used. LIP [38] analyzes theimpact of Bloom filters for star schema with a specific typeof left deep trees, where the fact table is at the bottom. Theyobserve that, if bitvector filters created from dimension ta-bles are pushed down to the fact table upfront, plans withdifferent permutations of dimension tables have similar cost.Motivated by this observation, we study the impact ofbitvector filters on query optimization. We focus on an impor-tant class of queries, i.e., complex decision support queries,and the plan space of right deep trees without cross prod-ucts, which is shown to be an important plan space for suchqueries [12, 17]. Our first contribution is to systematicallyanalyze the impact of bitvector filters on optimizing thejoin order of star and snowflake queries with primary-key-foreign-key (PKFK) joins in the plan space of right deep treeswithout cross products (Section 3-5). Prior work has shownthat, without bitvector filters, the number of plans for starand snowflake queries in this plan space is exponential inthe number of relations in the query [31]. Intuitively, theplan space complexity should further increase with bitvectorfilters integrated into query optimization due to violationof substructure optimality.

Our key observation is that,when the bitvector filters have no false positives, certain joinorders can be equivalent or inferior to others with respectto the cost function C out [28, 30], regardless of the queryparameters or the data distribution. By exploiting this obser-vation, we prove that, with some simplifying assumption, forstar and snowflake queries with PKFK joins, the plan of theminimal C out with bitvector filters can be found by choosingfrom a linear number of plans in the number of relations inthe query in this plan space. To the best of our knowledge,this is the first work that analyzes the interaction betweenbitvector filters and query optimization for a broad range ofdecision support queries and a wide plan search space.While star and snowflake queries are common patternsfor decision support queries, in practice, the join graphs caninclude multiple fact tables and non-PKFK joins. Our sec-ond contribution is to propose an algorithm that optimizesthe join order for arbitrary decision support queries moti-vated by our analysis (Section 6). Our technique applies toqueries with arbitrary join graphs. Since creating and ap-plying bitvector filters adds overhead, we further optimizeour algorithm by selectively adding bitvector filters based ontheir estimated benefit (Section 6.3). Our algorithm can be k tk mk tk (a) Join graphw/ cardinality t mk HJ HJ k

7K / 7 4K / 144Total cost: 10939 (b) Best plan P withoutbitvector filters t mkHJ HJ k

4K / 11867K / 7

4K / 15Total cost: 2261113K / 338715K / 715

4K / 11867K / 7

4K / 15Total cost: 2261 (c) Post-process P byadding bitvector filters mk tHJ HJ k

1K / 37

4K / 2867K / 7

4K / 15Total cost: 760 (d) Best plan P withbitvector filters mk t HJ HJ k

7K / 7 4K / 144Total cost: 12831 (e) P is suboptimal with-out bitvector filters Figure 2: Example of ignoring bitvector filters in query optimization results in a suboptimal plan integrated into a query optimization framework as a trans-formation rule [19, 20]. Depending how a DBMS handlesbitvector filters in query optimization, we propose three op-tions to integrate our technique into the DBMS (Section 6.4).We implement our algorithm in Microsoft SQL Server (Sec-tion 7.1). We evaluate our technique on industry benchmarksTPC-DS [1] and JOB [25] as well as a customer workload(Section 7). We show that, comparing to the query plans pro-duced by the original Microsoft SQL Server, our techniquereduces the total CPU execution time of a workload by 22%to 64%, with up to two orders of magnitude reduction inCPU execution time for individual queries. We show thatour technique is especially effective in reducing executioncost for expensive queries with low selectivity, where rightdeep trees is a preferable plan space [12, 17].We discuss related work in Section 8 and conclude thework in Section 9.

In this section, we describe the details of bitvector filterscreation and push-down algorithm following [18].At a high level, each hash join operator creates a single bitvector filter from the equi-join columns on the build side.This bitvector filter is then pushed down to the lowest pos-sible level on the subtree rooted at the probe side so that itcan eliminate tuples from that subtree as early as possible.Algorithm 1 shows how to push down bitvectors givena query plan. The algorithm takes a query plan as its input.Starting from the root of the query plan, the set of bitvectorpushed down to the root is initialized to be empty (line 3) andeach operator is then processed recursively in a pre-ordertraversal. At each operator, it takes the set of bitvector filterspushed down to this operator as an input. If the operator isa hash join, a bitvector filter is created from the build sidewith the equi-join columns of this hash join as the keys ofthe bitvector filter and is added to the set of bitvector filtersapplied to the probe side of this hash join (line 8-10). Nowconsider every bitvector filter that is pushed down to this hash join operator. If one of the child operator of the joinoperator contains all the columns in the bitvector filter, thebitvector filter is added to the set of bitvector filters pusheddown to this child operator; otherwise, the bitvector filtercannot be pushed down further, and it is added to the setof bitvector filters pushed down to this join operator (line12 - 23). If the set of bitvector filters pushed down to thisjoin operator is non-empty, add a filter operator on top ofthis join operator to apply the bitvector filters. In this case,update the root of this subplan to the filter operator (line24-29). Recursively process the bitvector filters pushed downto the child operators and update the children accordingly(line 30 - 33). Finally, return the updated root operator of thissubplan (line 34). An example of creating and pushing downbitvector filters with Algorithm 1 is shown in Figure 1.

We start with the properties of bitvector filters and the costfunction (Section 3). We then show that, with bitvector filters,the number of candidate plans of the minimal cost is linearfor star and snowflake queries with PKFK joins in the planspace of right deep trees without cross products (Section 4and Section 5). We finally describe the general bitvector-aware query optimization algorithm for arbitrary decisionsupport queries and how to integrate it with a Volcano /Cascades style optimizer (Section 6). Table 1 summarizes thenotations. Table 2 summarizes the results of our analysis.

We start with the properties of bitvector filters:

Property 1.

Commutativity : R /( R , R ) = R /( R , R ) Property 2.

Reduction : | R / R | ≤ | R | Property 3.

Redundancy : ( R ▷◁ R )/ R = R ▷◁ R Property 4.

Associativity : R /( R , R ) = ( R / R )/ R if thereare no false positives with the bitvector filters created from ( R , R ) , R , and R . lanPushDown( plan ) :Input: Query plan plan

Output:

New query plan plan ′ with bitvectors root ← plan . GetRootOperator () plan ′ ← plan root ′ ← OpPushDown ( op , ∅) plan ′ . SetRootOp ( root ′ ) return plan ′ OpPushDown( op , B ) :Input: Operator op , set of bitvectors B Output:

New operator op ′ with bitvectors residualSet ← ∅ pushDownMap ← ∅ if op is Hash Join then b ← bitvector created from op . GetBuildChild () pushDownMap [ op . GetProbeChild ()] ← pushDownMap [ op . GetProbeChild ()] ∪ b end foreach bitvector b in B do ops ← ∅ foreach child c of operator op do if b can be pushed down to c then ops ← ops ∪ { c } end end if | ops | (cid:44) then residualSet ← residualSet ∪ { b } else pushDownMap [ c ] ← pushDownMap [ c ] ∪ { b } end end op ′ ← op if residualSet (cid:44) ∅ then f ilterOp ← CreateFilterOp ( op , residualSet ) f ilterOp . AddChild ( op ) op ′ ← f ilterOp end foreach child c of op do c ′ ← OpPushDown ( c , pushDownMap [ c ]) op . U pdateChild ( c , c ′ ) end return op ′ Algorithm 1:

Push down bitvectorsNow we prove the absorption rule of bitvector filters forPKFK joins. The absorption rule says that, if R joins R witha key in R , the result of joining R and R is a subset of theresult of semi-joining R and R . Formally, Table 1: List of notations

Notation Description q a query R a relation R a set of relations T = T ( R , · · · , R n ) a right deep tree with R as the rightmost leaf and R n as the left most leaf S ( R , · · · , R n , B , · · · , B m ) join of relations R , · · · , R n after ap-plying bitvector filters created from B , B , · · · , B m , where B i is either abase relation or a join result. We omit B , · · · , B m when they are clear fromthe context. We use the notation in-terchangeably with (cid:90) | R | cardinality of a base relation or an in-termediate join result after applyingbitvector filters R / R semi join of R with R , where R / R ⊆ R R /( R , · · · , R n ) semi join of R with R , · · · , R n , where R /( R , · · · , R n ) ⊆ R R → R the join columns of R and R is a keyin R . If the join columns form a pri-mary key in R , then R → R is aprimary-key-foreign-key join C out cost function (See Section 3.3) (cid:206) R ( R ) project out all the columns in R from R , where the columns in R is a super-set of that in R . The resulting relationhas the same number of rows as R butless number of columns per rowLemma 1. Absorption rule : If R → R , then R / R ⊇ (cid:206) R ( R ▷◁ R ) and | R / R | ≥ | R ▷◁ R | . The equality happensif the bitvector filter created from R has no false positives. Proof. For every tuple r in R , it can join with a tuple in R if and only if the join columns in r exist in R . Because R → R , there is at most one such tuple in R . Thus, R / R ⊆ (cid:206) R ( R ▷◁ R ) . □ Since our analysis focuses on the quality of logical join or-dering, we measure the intermediate result sizes (i.e., C out )as our cost function similar to prior work on join order anal-ysis [28, 30]. In practice, C out is a good approximation forcomparing the actual execution cost of plans. C out measures the cost of a query plan by the sum ofintermediate result sizes. Because bitvector filters also impactthe cardinality of a base table, we adapt C out to include the able 2: Summary of the plan space complexity for star and snowflake queries with unique key joins joingraph graph size C out star n dimensiontables n + n n + T ( R , R , · · · , R n ) , { T ( R k , R , R , R , · · · , R k − , R k + , · · · , R n ) , ≤ k ≤ n } snowflake m branchesof lengths n i , ≤ i ≤ m n + , n = (cid:205) mi = n i exponential to n n + T ( R , R , , · · · , R , n , · · · , R n , , · · · , R n , n m ) , { T ( R i , a , · · · , R i , a n , R , R , n , · · · , R i − , , · · · , R i − , n i − , R i + , , · · · , R i + , n i + , · · · , R n , , · · · , R n , n m )} (see Section 5 for a , · · · , a n ) R R R R Figure 3: Star query graph with PKFK joins, where thefact table is R and dimension tables are R , R , R base table cardinality as well. Formally, C out ( T ) = (cid:40) | T | if T is a base table | T | + C out ( T ) + C out ( T ) if T = T ▷◁ T (1)Note that | T | has reflected the impact of bitvector filters,where | T | represents the cardinality after bitvector filtersbeing applied for both base tables and join results. We define star queries with PKFK joins as the following:

Definition 1.

Star query with PKFK joins:

Let R = { R , R , · · · , R n } be a set of relations and q be a query joiningrelations in R . The query q is a star query with PKFK joinsif R → R k for ≤ k ≤ n . R is called a fact table, and R k , ≤ k ≤ n , is called a dimension table. Figure 3 shows an example of a star query, where R isthe fact table and R , R , R are dimension tables.Now we analyze the plan space complexity for star querieswith PKFK joins. We show that, in the plan space of rightdeep trees without cross products, we can find the query planof the minimal cost (under the cost function from Section 3.3)from n + n + n [31]. Our key intuition is that, in the plan space of right deeptrees without cross products, the cost of plans of a star query R R HJ HJ R |R | |R | |R /(R , R )| |R R R | |R R /R | |R R R | |R R /R | (a) Plan P R R HJ HJ R |R | |R | |R /(R , R )| |R R R | |R R /R | (b) Plan P Figure 4: Example of two plans of a star query { R , R , R } with PKFK joins using bitvector filters.Each operator is annotated with the intermediate re-sult size. Plan P and P have different join orders ofdimension tables but the same cost. with PKFK joins can be the same with different join ordersof dimension tables. This is because all the bitvector filtersfor a star query will be pushed down to the fact table; and byLemma 1, we can show the cost of many join orders is thesame. Figure 4 shows an example of two plans of a star querywith PKFK joins using different join orders of dimensiontables but having the same cost.Formally, our key results in this section are:Theorem 4.1. Minimal cost right deep trees for starquery:

Let R be the set of relations of a star query as defined inDefinition 1. Let A = { T ( X , · · · , X n )} be the set of right deeptrees without cross products for q , where X , · · · , X n is a permu-tation of R , · · · , R n . If C min = min { C out (T ) , T ∈ A} , thenthere exists a plan

T ∈ A candidates = { T ( R , R , · · · , R n )} ∪{ T ( R k , R , R , · · · , R k − , R k + , · · · , R n ) , ≤ k ≤ n } such that C out (T ) = C min . Theorem 4.2.

Plan space complexity for star query:

Let R be the set of n + relations of a star query as defined inDefinition 1. We can find the query plan with the minimal costin the place space of right deep trees without cross productsfrom n + candidate plans. We start the analysis by understanding the plan space ofright deep trees without cross products for star queries:emma 2.

Right deep trees for star query:

Let R be theset of relations of a star query as defined in Definition 1. Let T = T ( X , X , X , · · · , X n ) be a query plan, where X , · · · , X n is a permutation of { R , R , R , · · · , R n } . Then T is a right deeptree without cross products if and only if X = R or X = R . The proof can be found in Appendix B.By Lemma 2, we divide the plans into two cases: whether R is the right most leaf or not.We first generalize Lemma 1 to multiple relations:Lemma 3. Star query absorption rule : Let R be a starquery as defined in Definition 1, then R /( R , R , · · · , R n )⊇ (cid:206) R ( R (cid:90) R (cid:90) · · · (cid:90) R n ) and | R /( R , R , · · · , R n )|≥| R (cid:90) R (cid:90) · · · (cid:90) R n | . The equality happens when the bitvectorfilters created from ( R , R , · · · , R n ) has no false positives. The proof can be found in Appendix B.We now show that, all the plans in this plan space wherethe right most leaf is R has the same cost C out if bitvectorfilters have no false positives. Formally,Lemma 4. Minimal cost right deep tree for star querywith right most leaf R : Let R be the set of relations of a starquery as defined in Definition 1. The cost of the right deep tree C out ( T ( R , X , X , · · · , X n )) is the same for every permutation X , X , · · · , X n of R , R , · · · , R n . Proof. Because R , R , · · · , R n only connects to R , and R is the right most leaf, based on Algorithm 1, all thebitvector filters created from R , R , · · · , R n will be pusheddown to R . Thus, C out ( X k ) = | X k | for 1 ≤ k ≤ n and C out ( R ) = | R /( X , X , · · · , X n )| . By Lemma 3, C out ( R ) = | R /( R , R , · · · , R n )| .Now consider the intermediate join result for S ( R , X , · · · , X k ) , where 1 ≤ k ≤ n . By Lemma 3, | S ( R , X , · · · , X k )| = | S ( R /( R , · · · , R n ) , X , · · · , X k )| = | S ( R , R , · · · , R n )| .Thus, C out ( S ( R , X , · · · , X k )) = C out ( S ( R , R , · · · , R n )) forall 1 ≤ k ≤ n .Since the total cost of the plan is C out ( T ( R , X , · · · , X n − )) = (cid:205) ni = | R i | + n · | S ( R , R , · · · , R )| , every permutation X , · · · , X n of R , · · · , R n has the same cost. □ Now consider the other case where R is not the rightmost leaf, and X = R . Let X = R k , ≤ k ≤ n , sim-ilarly, we show that the cost of the plans in the form of T ( R k , R , X , X , · · · , X n − ) is the same for every permuta-tion of R , R , · · · , R k − , R k + , · · · , R n if bitvector filters haveno false positives. Formally,Lemma 5. Minimal cost right deep tree for star querywith right most leaf R k : Let R be the set of relations of a starquery as defined in Definition 1. The cost of the right deep tree C out ( T ( R k , R , X , X , · · · , X n − ) is the same for every permu-tation X , X , · · · , X n − of R , R , · · · , R k − , R k + , · · · , R n . The proof can be found in Appendix B.By combining Lemma 4 and Lemma 5, we can prove The-orem 4.1 and Theorem 4.2. R R R R R R Figure 5: Snowflake query with PKFK joins,where the fact table is R and the branches are { R , } , { R , , R , } , { R , , R , } We define snowflake queries with PKFK joins as below:

Definition 2.

Snowflake query with PKFK joins : Let R = { R , R , , · · · , R , n , R , , · · · , R , n , · · · , R m , , · · · , R m , n m } be a set of relations and q be a query joining relations in R .The query q is a snowflake query with PKFK joins if • R → R i , for ≤ i ≤ m and • R i , j − → R i , j for ≤ i ≤ m , < j ≤ n i .We call R the fact table and R i , , R i , , · · · , R i , n i a branch. Wedenote the branch { R i , , R i , , · · · , R i , n i } as R i . Figure 5 shows an example of a snowflake query, where R is the fact table, and { R , } , { R , , R , } , { R , , R , } are threebranches of dimension tables.Now we analyze the plan space complexity for thesnowflake query (Definition 2). We will show that, in theplan space of right deep trees without cross products, we canfind the query plan of the minimal cost (under the cost func-tion from Section 3.3) from n + n + n .We divide the plans into two cases: whether R is the rightmost leaf or not. We start with the case where R is the rightmost leaf. Then we analyze a subproblem of the plan spacefor a branch in a snowflake query. We finally analyze thecase where R is not the right most leaf.Formally, our key results in this section are:Theorem 5.1. Minimal cost right deep trees forsnowflake query:

Let R be the set of relations ofa snowflake query q as described in Definition 2. Let C min = min { C out ( T ( X , X , · · · , X n ))} , where X , X , · · · , X n is a permutation of R , and T ( X , X , · · · , X n ) isa right deep tree without cross products for q . Thenthere exists a right deep tree T ′ ∈ { T ( R i , a , R i , a , · · · , R i , a n i , R , R , , · · · , R , n , · · · , R i − , , · · · , R i − , n i − , R i + , , · · · , R i + , n i + · · · , R n , , · · · , R n , n m )} ∪ { T ( R , R , , R , , · · · , R n , , · · · , R n , n m )} , where a , a , · · · , a n i is a permutation of , , · · · , n i , such that C out (T ′ ) = C min . heorem 5.2. Plan space complexity for snowflakequery:

Let R be the set of n + relations of a snowflakequery q as described in Definition 2. We can find the queryplan with the minimal cost in the place space of right deeptrees without cross products from n + candidate plans. R is the right most leaf Let’s first look at the right deep trees where R is the rightmost leaf. Our key insight is to extend our analysis on starqueries and show that all the trees in this plan space havethe same C out .We define a class of right deep trees where a relation witha PKFK join condition only appears on the right side of therelations it joins with in a snowflake query. Formally, Definition 3.

Partially-ordered right deep tree:

Let R be the set of relations of a snowflake query q as describedin Definition 2. Let T = T ( R , X , · · · , X n ) be a plan for q ,where X , · · · , X n is a permutation of R − { R } . If for any X i , ≤ i ≤ n , either X i = R p , or there exists X j , ≤ j < i such that X j → X i , we call T a partially-ordered right deeptree. Now we show that the plans in the space of right deeptrees without cross products are partially-ordered trees if R is the right most leaf. Formally,Lemma 6. Right deep tree without cross products forsnowflake query:

Let R be the set of relations of asnowflake query q as described in Definition 2. If T = T ( R , X , X , · · · , X n ) is a right deep tree without cross prod-ucts for q , then T is a partially-ordered right deep tree. Proof. If T is not partially ordered, then there exists X i such that X i (cid:60) { R , , R , , · · · , R n , } and there does not exist X j , i < j ≤ n such that X j → X i . Then X i does not join with R , X n , X n − , · · · , X i + . So there exists a cross product. □ Now we show all the partially-ordered right deep treeshave the same cost if R is the right most leaf.Follow Lemma 6 and Algorithm 1, we haveLemma 7. Bitvector filters in partially-ordered rightdeep tree:

Let R be the set of relations of a snowflake query q as described in Definition 2. If T = T ( R , X , X , · · · , X n ) is aright deep tree without cross products for q , then the bitvectorfilter created from R i , j will be pushed down to R i , j − if j > or R if j = . The proof can be found in Appendix B.Follow Lemma 7, we haveLemma 8.

Equal cost for partially-ordered right deeptree:

Let R be the set of relations of a snowflake query q asdescribed in Definition 2. Let T = T ( R , X , X , · · · , X n ) and T ′ = T ( R , Y , Y , · · · , Y n ) be two partially ordered right deeptrees of q . Then C out (T ) = C out (T ′ ) . The proof can be found in Appendix B. R R HJ HJ R |R /R | |R /(R , R /R )| HJ R |R | |R | |R R /(R R )||R R R /R | |R R R R | |R R R /R | |R R R R | |R R R /R | |R R R R | (a) Plan P R R HJ HJ R |R | |R /(R , R /R )||R R /(R /R )| HJ R |R /R | |R | |R R R /R | |R R R R | |R R R /R | |R R R R | |R R R /R | |R R R R | (b) Plan P Figure 6: Example of two plans for a branch { R , R , R , R } of a snowflake query with PKFKjoins using bitvector filters. Each operator is anno-tated with the intermediate result size. Plan P doesnot join R and R consecutively in its right subtree.Pushing down R to join with R consecutively resultsin plan P with reduced cost. Before diving into the case where R is not the right most leaf,we first analyze a subproblem of a branch in a snowflakequery in the plan space of right deep trees without crossproducts. We show that the plan space complexity is linearin the number of relations in the branch. Formally, we definea branch as the following: Definition 4.

Branch of a snowflake query:

Let R = { R , R , · · · , R n } be a set of relations and q be a query joiningrelations in R . The query q is a branch if R k − → R k for all ≤ k ≤ n . Figure 5 shows an example of a snowflake query withthree branches.We show that, in the plan space of right deep trees withoutcross products, we can find the query plan with minimal C out from n + n + n [31]. Our key insight is that, for a plan with right most leaf R k , where 1 ≤ k ≤ n , if the plan has minimal cost, it mustjoin R k , R k + , · · · , R n consecutively in its right subtree. Oth-erwise, we can reduce the plan cost by altering the join orderand ’pushing down’ the relations R n , R n − , · · · , R m + intothe right subtree. Figure 6 shows an example of how the plancost can be reduced by ’pushing down’ the relations.Formally, our key results in this subsection are:Theorem 5.3. Minimal cost right deep trees for abranch:

Let R be the set of relations of a branch as de-scribed in Definition 4. Let A = { T ( X , X , · · · , X n )} be theset of right deep trees without cross products for q , where X , X , · · · , X n is a permutation of R , R , · · · , R n . If C min = min { C out ( T ( X , X , · · · , X n ))} , then there exists a plan T ∈A candidates = { T ( R n , R n − , · · · , R )} ∪ { T ( R k , R k + , · · · , n , R k − , R k − , · · · , R ) , ≤ k ≤ n − } such that C out (T ) = C min . Theorem 5.4.

Plan space complexity for a branch:

Let R be the set of n + relations of a branch as described inDefinition 4. We can find the query plan with the minimal costin the place space of right deep trees without cross productsfrom n + candidate plans. Consider the query plan for a branch { R , R , · · · , R n } ofthe snowflake in the plan space of right deep trees withoutcross products. Let’s first look at the query plans where R n is the right most leaf. Formally,Lemma 9. Let R be the set of relations of a branch as de-scribed in Definition 4. There exists only one right deep treewithout cross products such that R n is the right most leaf, thatis, T ( R n , R n − , · · · , R ) . The proof can be found in Appendix B.Now we look at the query plans where R n is not theright most leaf. Let T ( X , · · · , X n ) be a right deep tree with-out cross products where X , · · · , X n is a permutation of R , · · · , R n . We show that, without joining R n , R n − , · · · , R k consecutively, a plan cannot have the minimal cost. Formally,Lemma 10. Cost reduction by pushing down R n : Let R be the set of relations of a branch as described in Def-inition 4. Let T = T ( X , X , · · · , X n ) be a right deep treewithout cross products for R , R , · · · , R n . Assume X k = R n for some ≤ k ≤ n . If X k − (cid:44) R n − , then T ′ = T ( X , X , · · · , X k , X k − , X k + , X k + , · · · , X n ) is a right deeptree without cross products and C cout (T ′ ) ≤ C out (T ) . The proof can be found in Appendix B.Lemma 11.

Cost reduction by pushing down R n , R n − , · · · , R n − m : Let R be the set of relations of a branchas described in Definition 4. Let T = T ( X , X , · · · , X n ) bea right deep tree without cross products for R , R , · · · , R n .Let X k = R n , X k − = R n − , · · · , X k − m = R n − m forsome m ≤ k ≤ n . If X k − m − (cid:44) R n − m − , then T ′ = T ( X , X , · · · , X k − m − , X k − m , X k − m + , · · · , X k , X k − m − , X k + , · · · , X n ) is a right deep tree without cross products and C cout (T ′ ) ≤ C out (T ) . The proof can be found in Appendix B.By combining Lemma 9 and Lemma 11, we can proveTheorem 5.3 and Theorem 5.4. R is not the right most leaf Now let’s look at the right deep trees where R is not theright most leaf for a snowflake query with PKFK joins.We first show that the relations appear on the left side of R can only come from a single branch given the join graphof a snowflake query. Formally, Lemma 12. Single branch in right most leaves:

Let R be the set ofrelations of a snowflake query q as described in Definition 2.Let T = T ( X , X , · · · , X n ) be a right deep tree without crossproducts for q , where X , X , · · · , X n is a permutation of R . If X k = R , then X k + , X k + , · · · , X n is a partially ordered rightdeep tree of the new relation R ′ = X (cid:90) X (cid:90) · · · (cid:90) X k . The proof can be found in Appendix B.Now we show that if a subset of relations of a single branch R i is on the right side of R , there exists a query plan withlower cost where all the relations in R i are on the right sideof R . Formally,Lemma 14. Cost reduction by consolidating asingle branch:

While star and snowflake queries with PKFK joins are impor-tant patterns in decision support queries, in practice, suchqueries can have more complicated join graphs. For example,a decision support query can join multiple fact tables, wherethe joins may not be PKFK joins. In addition, there can bejoin conditions between the dimension tables or branches,where the bitvector filters created from the dimension tablesmay not be pushed down to the fact table. Finally, there canbe dimension tables or branches that are larger than the facttable after predicate filters, where the fact table should beon the build side in the plan space of right deep trees.In this section, we first propose an algorithm to ex-tend bitvector-aware query optimization to an arbitrarynowflake query with a single fact table. We then gener-alize it to arbitrary decision support queries with multiplefact tables. Our algorithm applies to queries with arbitraryjoin graphs. We further optimize our algorithm with cost-based bitvector filters. We also discuss options to integrateour algorithm into a Volcano / Cascades query optimizer.

We propose an algorithm (Algorithm 2) with simple heuris-tics to construct the join order for an arbitrary snowflakequery with a single fact table. The key insight is to leveragethe candidate plans of minimal cost analyzed in Section 5.Algorithm 2 shows how to construct the join order for adecision support query with a single fact table.We first assign priorities to the branches based on theirviolations of the snowflake pattern as defined in Definition 2.We then sort the branches in descending order by their prior-ities (line 1). Intuitively, if the bitvector filters created fromdimension tables are all pushed down to the fact table ex-cept for one, where the corresponding dimension table eitherjoins with another dimension table or is not on the build side.Since this dimension table does not create a bitvector filterthat is pushed down to the fact table, joining this dimensiontable early with the fact table can eliminate the unnecessarytuples that do not qualify the join condition early in the plan.Specifically, we assign priorities to branches for snowflakequeries with the following heuristics: • Group P0: Relations that do not have join condition orPKFK joins with the fact table (line 23). This can happenwhen joining multiple fact tables. As a heuristic, we jointhese branches by descending selectivity on the fact table(line 23). • Group P1: Branches that do not join with any otherbranches and have smaller cardinality than the fact ta-ble (line 24). These branches are joined with the fact tablebefore joining the branches in group P0. • Group P2: Branches joining with other branches (line 21).Such branches should be joined consecutively in the rightdeep tree to allow pushing down bitvector filters createdby these branches. As a heuristic, within a set of connectedbranches, we join these branches with descending selec-tivity on the fact table (line 31); across sets of connectedbranches, we prioritize the sets of larger numbers of con-nected branches (line 21). • Group P3: Branches that are larger than the fact table (line25). Since it is clearly suboptimal to put these brancheson the build side, we reorder the build and probe sidesfor them (line 12-13). Joining these branches early allowspushing down the bitvector filters created from the facttable. As a heuristic, we order the branches in this groupwith descending selectivity on the fact table (line 31).

OptimizeSnowflake( G ) :Input: Join graph G Output:

Query plan plan B ← SortedBranches ( G . Branches ) best → JoinBranches ( B , G . Fact , ∅) foreach branch b in B do p ← Join ( OptimizeChain ( b , G . Fact ) , G . Fact ) p ← JoinBranches ( B \ b , G . Fact , p ) if best . Cost > p . Cost then best ← p end return best JoinBranches( B , f , p ) :Input: A set of branches B , fact table f , a plan p Output:

A query plan p ′ p ′ ← p foreach branch b in B do foreach table t in b do if t . Card > f . Card then p ′ ← Join ( p ′ , t ) else p ′ ← Join ( t , p ′ ) end end return p ′ SortBranches( G ) :Input: Join graph G Output:

Sorted branches sortedBranches дroups ← GroupBranches ( G ) sortedG ← SortBySizeDesc ( дroups ) priority ← [] for i = i < дroups . Count () ; i + + do if sortedG [ i ] . Size > then priority [ i ] ← sortedG [ i ] . Size else if IsNonU niqueKeyJoin ( д [ ] , f ) then priority [ i ] ← if д [ ] . Card < f . Card then priority [ i ] ← else priority [ i ] ← | G | + end end sortedG ← SortByPriorityDesc ( дroups , priority ) sortedBranches ← [] foreach дroup in sortedG do branches ← SortBySelectivityDesc ( дroup ) foreach b in branches do sortedBranches . Add ( b ) end return sortedBranches Algorithm 2:

Construct a join order for a snowflake querywith a single fact tableased on the analysis in Section 5, we construct the candi-date plans by two cases. If R is the right most leaf, we joinall the branches with the fact table (line 2); otherwise, foreach branch, we optimize the branch based on the analysisin Section 5.2, join the remaining branches to complete theplan, and update the best plan if the estimated cost of thenew plan is lower (line 3-7). In addition to snowflakes with a single fact table, complexdecision support queries can include multiple fact tables.We further extend our algorithm to arbitrary join graphs byiteratively extracting and optimizing snowflake join graphs.At a high level, our algorithm produces a join order for ajoin graph by alternating two stages iteratively as shown inAlgorithm 3. In the snowflake extraction stage (line 2), we ex-tract a snowflake subgraph from a join graph by identifyinga single fact table and its related dimension tables, poten-tially with non-PKFK joins. In the snowflake optimizationstage (line 3), we use Algorithm 2 to produce a join orderfor the extracted subgraph. The resulting snowflake will bemarked as ’optimized’ and considered as a new relation inthe updated join graph (line 4-5). Our algorithm alternatesthe two stages until the full join graph is optimized (line 1).Specifically, when extracting a snowflake (line 8-19), arelation is considered as a fact table if it does not join withany other table where the join predicate is an equi-join onits key columns. Among all the unoptimized fact tables in G , we find the one with the smallest cardinality and expandfrom this table recursively to include all related dimensionrelations (line 4-9). If there is only one fact table in G , wesimply return the original join graph (line 11). In practice, creating and applying bitvector filers has over-heads. Consider a hash join with build side R and probe side S . Assume the bitvector filter eliminates λ percent of the tu-ples from S . The ratio λ can be estimated by the optimizer thesame way as an anti-semi join operator, and it can includethe estimated false positive rate of the bitvector filter.Assume the cost of a hash join consists of building thehash table д b , probing the hash table д p , and outputing theresulting tuples д o . Let the cost of creating and applying abitvector filter be h and f . The cost difference of the hashjoin with and without using the bitvector filter is Cost ∆ = д p (| S |) − д p ( λ | S |) − f (| R |) − h (| S |) Assume the cost of probing a tuple is C p , the cost of check-ing a tuple against a bitvector filter is C f , and creating abitvector filter is relatively cheap, i.e., f (| R |) << h (| S |) . Then Cost ∆ = | S |(( − λ ) C p − C f ) − f (| R |) ∼ | S |(( − λ ) C p − C f ) Using a bitvector filter reduces the cost of a hash join if

OptimizeJoinGraph( G ) :Input: Join graph G Output:

Query plan plan while | G | > do G ′ ← ExtractSnow f lake ( G ) p ← OptimizeSnow f lake ( G ′ ) G ← U pdate JoinGraph ( G , G ′ ) plan ← U pdateQueryPlan ( plan , p ) end return plan ExtractSnowflake( G ) :Input: Join graph G Output:

Snowflake G ′ n ← G sorted ← SortByCardinalityAsc ( G ) foreach д in G sorted do if д is an unoptimized fact table then if n == then G ′ ← ExpandSnow f lake ( д ) end n ← n + end end if n == then G ′ ← G return G ′ Algorithm 3:

Construct a join order for a decision supportquery with an arbitrary join graph

Cost ∆ < ∼ | S |(( − λ ) C p − C f ) < ⇔ λ > − C f / C p Let λ thresh = − C f / C p . Note that λ thresh is independentof R and S . We can run a micro-benchmark to profile C f and C p and compute λ thresh . When the bitvector filter is pusheddown below the root of the probe side, a more detailed anal-ysis is needed to account for the cascading effect of tupleelimination. Empirically, choosing a threshold that is slightlysmaller than 1 − C f / C p works well. Our algorithm can transform a query plan by optimizing thejoin order with the underlying join graph. Thus, our algo-rithm can be used as a new transformation rule in a Volcano/ Cascades query optimization framework upon detecting asnowflake join (sub)graph. There are three integration op-tions depending on how the underlying optimizer accountsfor the impact of bitvector filters: • Full integration:

When applying join order transformationto a (sub)plan, the placement of bitvector filters and theirselectivity can change. If the underlying Volcano / Cas-cades query optimization framework can correctly accountfor the placement and the selectivity of bitvector filters uring query optimization, the new transformation rulecan be transparently integrated into the query optimizerthe same way as any existing transformation rule. • Alternative-plan integration:

If the query optimizer canaccount for the placement and the selectivity of bitvectorfilters in a final plan after query optimization, the newtransformation rule can be used to produce an alternativeplan. The optimizer can then choose the plan with thecheaper estimated cost from the alternative plan and theplan produced by the original query optimization. • Shallow integration:

We mark a (sub)plan after it is trans-formed by our new transformation rule. The underlyingquery optimization framework works as usual, except ad-ditional join reordering on marked (sub)plans is disabled.

We implement Algorithm 3 in Microsoft SQL Server as atransformation rule. Microsoft SQL Server has a cost-based,Volcano / Cascades style query optimizer. Starting from aninitial query plan, the optimizer detects various patterns inthe plan and fires the corresponding transformation rules.Due to the importance of decision support queries, MicrosoftSQL Server has implemented heuristics to detect snowflakepatterns and transform the corresponding subplans.We leverage the snowflake detection in Microsoft SQLServer and transform the corresponding subplan as describedin Algorithm 3. We implement a shallow integration (Sec-tion 6.4), where join reordering is disabled on the trans-formed subplan. The subplan is subject to other transforma-tions in Microsoft SQL Server. We use the original cardinalityestimator and cost modeling in Microsoft SQL Server, andthe selectivity of a bitvector filter is estimated the same wayas the existing semi-join operator. We implement the cost-based bitvector filter as described in Section 6.3, and we willdiscuss how we profile the elimination threshold λ thresh in Section 7.3. The final plan is chosen with the existingcost-based query optimization framework. Workload.

We evaluation our technique on three work-loads: TPC-DS [1] 100GB with columnstores, JOB [25] withcolumnstores, primary key indexes, and foreign key indexes,and a customer workload (CUSTOMER) with B+-tree indexes.Table 3 summarizes the statistics of our workloads. In par-ticular, CUSTOMER has the highest number of average joinsper query, and JOB has the most complex join graphs, includ-ing joining multiple fact tables, large dimension tables, andjoins between dimension tables. Our workloads also coverthe range of different physical configurations, with B+ trees(CUSTOMER), columnstores (TPC-DS), or both (JOB).

Table 3: Statistics of workloads, including databasesize, the number of tables, queries, indexes (B+ treesand columnstores), and joins.

Statistics TPC-DS JOB CUSTOMERDB Size 100GB 7GB 700GBTables 25 21 475Queries 99 113 100B+ trees / column-stores 0 / 20 44 / 20 680 / 0Joins avg / max 7.9 / 48 7.7 / 16 30.3 / 80

Baseline.

We use the query plans produced by the originalMicrosoft SQL Server as our baseline. Bitvector filters arewidely used in the query plans of Microsoft SQL Server. Asshown in Appendix A, 97% queries in JOB, 98% queries inTPC-DS, and 100% queries in CUSTOMER have bitvectorfilters in their original plans. A bitvector filter can be createdfrom a hash join operator, and it is pushed down to thelowest level on the probe side as described in Algorithm 1.The query optimizer in Microsoft SQL Server uses heuristicsto selectively add bitvector filters to the query plan withoutfully accounting for the impact of bitvector filters during thequery optimization stage. In particular, the heuristics usedin its snowflake transformation rules neglect the impact ofbitvector filters. We use a generous timeout for the queryoptimizer in Microsoft SQL Server so that it can explore alarge fraction of the relevant plan search.

Overhead.

Our technique adds very low overhead to queryoptimization. In fact, since we disable join reordering on thesnowflake subplan after it is optimized by our transformationrule, the query optimization time with our transformationrule is one third of that with the original Microsoft SQLServer in average. We also measure the memory consump-tion for query execution. We observe some increase in mem-ory consumption with our technique, since it favors rightdeep trees. The overall increase in memory consumption isnot significant.

Environment.

All the experiments are run on a machinewith Intel Xeon CPU E5 - 2660 v3 2.6GHz, 192GB memory,a 6.5TB hard disk, and Windows Server 2012 R2. To reduceruntime variance, all the queries are running in isolation atthe same level of parallelism. The query CPU time reportedis an average over ten warm runs.

As discussed in Section 6.3, we can choose a tuple eliminationthreshold to selectively create bitvector filters. We profilethe overhead of bitvector filters with a micro-benchmark byrunning the following query in TPC-DS: igure 7: Profile bitvector filters. Abitvector filter reduces overall cost ifit eliminates > % tuples N o r m a li z e d C P U Selectivity of Bitmap

Hash Join ProbeHash Join w/ Bitvector Probe w/ Bitvector

Figure 8: Total query execution CPUtime for a workload, breaking downby query selectivity N o r m a li z e d C P U Original L Original M Original S

BQO L BQO M BOQ S

Figure 9: Total number of tuplesoutput by operators in a workload,breaking down by operator types N o r m a li z e d C o un t Original Join Original Leaf Original OthersBQO Join BQO Leaf BQO Others0.65 0.92 0.77

SELECT COUNT(*)FROM store_sales, customerWHERE ss_customer_sk = c_customer_skAND c_customer_sk % 1000 < @P

The query plan joins customer and store_sales with a hashjoin. A bitvector filter is created from customer on the buildside and pushed down to store_sales on the probe side, wheretuples are eliminated before the join. We control the selec-tivity of the bitvector filter with the parameter @P.Figure 7 shows the CPU time of execution of the queryvarying its selectivity with and without bitvector filtering,normalized by the same constant. We further break downthe CPU time by the hash join operator, the probe side, andthe build side. Since the CPU time for reading customer isvery small, we omit it in Figure 7 for readability.With selectivity 1, no tuples are eliminated by the bitvector.With bitvector filtering, the hash join operator is slightlymore expensive due to creating the bitvector filter, and theprobe side operator has higher execution CPU due to theoverhead of checking the tuples from store_sales against thebitvector filter. As the selectivity increases, the bitvectorfilter eliminates more tuples from the probe side and theexecution cost of the hash join operator reduces. The planwith bitvector filtering becomes cheaper than the other planonce the bitvector filter eliminates more than 10% of thetuples. The cost reduction can be even more with queries ofmultiple joins. Empirically, we find 5% to be a good threshold,and we set λ thresh to 5% in our implementation.In Appendix A, we further evaluate the effectiveness andapplicability of bitvector filters as a query processing tech-nique. As shown in Table 4, Microsoft SQL Server uses bitvec-tor filters for 97% − −

80% workload-level execution CPU cost reduction.This confirms that bitvector filters is a widely applicablequery processing technique, and thus bitvector-aware queryoptimization can potentially impact a wide range of queries.

Figure 8 shows the total amount of CPU execution timereduction with our technique. We sum up the total CPU exe-cution time of the plans produced by Microsoft SQL Serverwith our technique and divide it by that of the plans pro-duced by the original Microsoft SQL Server. On average,the total workload execution CPU time has been reducedby 37%. We observe that workloads with more complicateddecision support queries benefit more from our technique,with the highest reduction of 64% in CPU execution time forJOB. Since Microsoft SQL Server has been heavily tuned tooptimize for these benchmarks, the degree of reductions inCPU execution time is very significant.We break down the CPU execution cost by query types.We divide the queries into three groups based on their selec-tivity, i.e., high ( S ), moderate ( M ), low ( L ). We approximatethe query selectivity by the execution CPU cost of the origi-nal query plans, with the cheapest 33 .

3% queries in group S ,the 33 .

3% most expensive queries in group L , and the rest ingroup M . We showed that, our technique is especially effec-tive in reducing CPU execution cost for expensive queriesor queries with low selectivity, i.e., with execution CPU re-duced by 4 . × for expensive queries in JOB benchmark. Thisis because that right deep trees is a preferable plan space forqueries with low selectivities ([12, 17]), and our techniqueproduces a better join order for right deep trees.Figure 9 shows the total number of tuples output by op-erators in the query plans produced by the original queryoptimizer ( Original ) and the bitvector-aware query optimizer(

BQO ), normalized by the total number of tuples output bythe original query plans in each workload. We sum up thenumber of tuples by the type of operators, including leaf op-erators, join operators, and other operators. Figure 9 shedssome insight on the amount of logical work done by oper-ators and thus the quality of query plans. With

BQO , boththe number of tuples processed by join operators as wellas leaf operators reduces. In particular, for JOB benchmark,

BQO reduces the normalized number of tuples output byoin operators from 0.50 to 0.24, i.e., a 52% reduction. Thisagain confirms that

BQO improves query plan quality byproducing a better join order.Figure 10 shows the normalized CPU execution time forindividual queries with the plans using our technique andthese from the original Microsoft SQL Server. The queriesare sorted by the CPU execution time of their original queryplans, and the top 60 most expensive queries are shown forreadability. Note that the Y axis uses a logarithmic scale. Weobserve a reduction of up to two orders of magnitude inCPU execution time for individual queries. Again, Figure 10confirms that our technique is especially effective in reduc-ing the CPU execution time for expensive decision supportqueries.Our technique can improve plan quality for two reasons.First, if a query optimizer does not fully integrate bitvectorfilters into query optimization, it can consider the best planwith bitvector filters as ’sub-optimal’ as shown in Figure 2.Second, due to the importance of decision support queries,many commercial DBMSs have developed dedicated heuris-tics to identify and optimize snowflake queries [3, 17, 37].If these heuristics do not consider the impact of bitvectorfilters, they can explore a different plan space which doesnot even contain the plans considered by our technique.Inevitably, there are regressions compared with the origi-nal plans. We investigate such regressions and discover threemajor reasons. First, our cost function C out does not capturethe physical information of operators and can be inaccurate.Second, our technique favors right deep trees, which can be-come suboptimal when the query is highly selective. Finally,our algorithm uses heuristics to extend to complex decisionsupport queries, which can be suboptimal in some cases. We discuss two lines of related work: plan search and bitvec-tor filters.

Plan search.

Many query optimization (QO) frameworks inDBMSs are based on either top-down [19, 20, 35] or bottom-up [4] dynamic programming (DP). There has been a largebody of prior work on join ordering and plan space complex-ity analysis with such QO frameworks [16, 26, 27, 29, 31].Due to the importance of decision support queries, manycommercial DBMSs have developed dedicated heuristics foroptimizing complex decision support queries [3, 17, 37] basedon the plan space of snowflake queries [22].In this work, we adapt the cost function used in analyzingjoin order enumeration [28, 30] for our analysis. We analyzethe space of right deep trees without cross products, whichhas been shown to be a favorable plan space for decisionsupport queries and bitvector filters [12, 17, 38].

Bitvector filter and its variants.

Semi-join is first introducedto reduce communication cost of distributed queries [6]. Effi-cient implementation of semi-joins have been heavily studiedin the past [8, 18, 36]. Several prior work has explored differ-ent schedules of bitvector filters for various types of queryplan trees [10–12]. Sideways information passing and magicsets transformation generalize the concept of bitvector filtersand combines them with query rewriting [5, 33].Many variants of bitvector filters have also been studiedin the past, such as Bloom filters [7], bitvector indexes [9],cuckoo filters [15], performance-optimal filters [24] and oth-ers [2, 32]. The focus of this line of research is on the trade-offbetween space and accuracy, the efficiency of filter opera-tions, and the extensions of Bloom filter.Due to the effectiveness of bitvector filters in reducingquery execution cost, several commercial DBMSs have im-plemented bitvector filter or its variants as query processingtechniques for decision support queries [13, 17, 21, 23].In this work, our analysis is based on the classic bitvectorfilter algorithm described in [18]. We mainly study the in-teraction between bitvector filters and query optimization,which is orthogonal to the prior work on bitvector filters asquery processing techniques.Lookahead Information Passing (LIP) [38] is the closestprior work to our work. LIP studies the star schema whereBloom filters created from dimension tables are all appliedto the fact table. The focus is on the order of applying Bloomfilters, and they observe such query plans are robust withdifferent permutations of dimension tables. Compared withLIP, our work systematically analyzes a much broader rangeof decision support queries and plan search space. Their con-clusion on plan robustness can be derived from our analysis.

In this work, we systematically analyze the impact of bitvec-tor filters on query optimization. Based on our analysis, wepropose an algorithm to optimize the join order for arbi-trary decision support queries. Our evaluation shows that,instead of using bitvector filters only as query processingtechniques, there is great potential to improve query planquality by integrating bitvector filters into query optimiza-tion for commercial databases.This work is the first step to understand the interaction be-tween bitvector filters and query optimization, and it opensnew opportunities for query optimization with many openchallenges. Extending the analysis to additional plan space,query patterns, operators beyond hash joins, and more com-plex cost modeling is challenging. Efficient full integrationof bitvector filters for commercial databases with variousarchitectures remains an open problem. Since our analysisshows that bitvector filters result in more robust query plans, igure 10: Individual query CPU time N o r m a li z e d C P U OriginalBQO (a) JOB N o r m a li z e d C P U OriginalBQO (b) TPC-DS N o r m a li z e d C P U OriginalBQO (c) CUSTOMER

Table 4: Query plans with and without bitvector filters

Workload CPUratio Ratio of quer-ies w/ bitvec-tor filters Improvedqueries RegressedqueriesJOB 0.20 0.97 0.58 0.00TPC-DS 0.53 0.98 0.88 0.00CUSTOMER 0.90 1.00 0.42 0.00which is also observed in [38], understanding how bitvectorfilters impact robust and interleaved query optimization isalso an interesting direction.

A ADDITIONAL EVALUATION

We evaluate the effectiveness of bitvector filters by executingthe same query plan with and without bitvector filtering. Weuse the original Microsoft SQL Server to produce a queryplan p with bitvector filters. Microsoft SQL Server providesan option to ignore bitvector filters during query processing.For comparison, we execute the same plan p with bitvectorfilters ignored.Table 4 shows the performance of the plans with and with-out bitvector filters for the three benchmarks. At a workloadlevel, using bitvector filters reduces the execution CPU costby 10% −

80% (

CPU ratio ). In addition, for 97% − Ratio of queries w/ bitvectof filters ), the originalquery plan uses bitvector filters. At an individual query level,48% −

88% of the queries has CPU execution cost reduced by more than 20% (

Improved queries ), with no regression onCPU execution cost by more than 20% (

Regressed queries ).This confirms that bitvector filtering is a widely applicablequery processing technique, and thus bitvector-aware queryoptimization can potentially impact a wide range of queries.

B ADDITIONAL PROOFS

Lemma 2.

Right deep trees for star query:

1. Then R i and R j do not have a join condition based on Definition 1.Thus, T ( X , X ) has a cross product, which is a contradiction.If X = R or X = R , since R joins with R , · · · , R n , then T = T ( X , X , · · · , X n ) does not contain any cross product. □ Lemma 3.

Star query absorption rule : Let R be a starquery as defined in Definition 1, then R /( R , R , · · · , R n )⊇ (cid:206) R ( R (cid:90) R (cid:90) · · · (cid:90) R n ) and | R /( R , R , · · · , R n )|≥| R (cid:90) R (cid:90) · · · (cid:90) R n | . The equality happens when the bitvectorfilters created from ( R , R , · · · , R n ) has no false positives. Proof. By Property 4, R /( R , R ) = ( R / R )/ R . Since R → R , by Lemma 1, R / R = (cid:206) R ( R (cid:90) R ) . Since R → R , R (cid:90) R → R . By applying Lemma 1 again, we have ( R (cid:90) R )/ R = (cid:206) R ( R (cid:90) R (cid:90) R ) . Thus, R /( R , R ) = (cid:206) R ( R (cid:90) R (cid:90) R ) .y induction, we can prove R /( R , R , · · · , R n ) = (cid:206) R ( R (cid:90) R (cid:90) · · · (cid:90) R n ) . □ Lemma 5.

Minimal cost right deep tree for star querywith right most leaf R k : Let R be the set of relations of a starquery as defined in Definition 1. The cost of the right deep tree C out ( T ( R k , R , X , X , · · · , X n − ) is the same for every permu-tation X , X , · · · , X n − of R , R , · · · , R k − , R k + , · · · , R n . Proof. Because R , · · · , R n only connects to R , thebitvector filters created from R , · · · , R k − , R k + , · · · , R n will be pushed down to R , and the bitvector cre-ated from R will be pushed down to R k . Thus, C out ( R ) = | R /( R , · · · , R k − , R k + , · · · , R n )| . Let R ′ = R /( R , · · · , R k − , R k + , · · · , R n ) , then C out ( R k ) = | R k / R ′ | .By Lemma 3 and Property 3, | S ( R k / R ′ , R ′ , X , X , · · · , X k )| = | S ( R , R , · · · , R n )| . Thus, the total cost of theplan is C out ( T ( R k , R , X , X , · · · , X n − )) = (cid:205) ni = , i (cid:44) k | R i | + C out ( R ) + C out ( R k ) + ( n − ) · | S ( R , R , · · · , R − n )| . Thus, C out ( T ( R k , R , X , X , · · · , X n − )) is the same for every per-mutation X , X , · · · , X n − of R , R , · · · , R k − , R k + , · · · , R n . □ Lemma 7.

Bitvector filters in partially-ordered rightdeep tree:

Let R be the set of relations of a snowflake query q as described in Definition 2. If T = T ( R , X , X , · · · , X n ) is aright deep tree without cross products for q , then the bitvectorfilter created from R i , j will be pushed down to R i , j − if j > or R if j = . Proof. Because T is partially ordered, for every relation X k = R i , j , j >

1, there exists one and only one relation X p , p < k such that X k connects to X p . Thus, the bitvectorfilter created from X k will be pushed down to X p . If X k = R i , ,it only connects to R . Thus, the bitvector filter created from X k will be pushed down to R . □ Lemma 8.

Equal cost for partially-ordered right deeptree:

Let R be the set of relations of a snowflake query q asdescribed in Definition 2. Let T = T ( R , X , X , · · · , X n ) and T ′ = T ( R , Y , Y , · · · , Y n ) be two partially ordered right deeptrees of q . Then C out (T ) = C out (T ′ ) . Proof. Consider the bitvector filters created in both T and T ′ . BY Lemma 7, the bitvector filters created from T and T ′ from the same relation R i , j will be pushed down tothe same relation R i , j − if j > R if j = S ( R i , n i ) is the same in T and T ′ . By induction,we can show that S ( R i , j ) is the same in T and T ′ . Since, S ( R ) = R /( S ( R , ) , S ( R , ) , · · · , S ( R n , )) , S ( R ) is the samein T and T ′ .Now consider the join cardinality in T and T ′ .By Lemma 3, S ( R i , j ) = S ( R i , j , R i , j + , · · · , R i , n i ) .Thus, S ( R ) = R /( S ( R , ) , S ( R , ) , · · · , S ( R n , )) = S ( R , R , , R , , · · · , R , n , R , , · · · , R m , , R m , , · · · , R m , n m ) .Thus, S ( R , X , X , · · · , X u ) = S ( R ) , ≤ u ≤ n and S ( R , Y , Y , · · · , Y v ) = S ( R ) , ≤ v ≤ n . Thus, C out ( S ( R , X , · · · , X u )) = C out ( S ( R , Y , · · · , Y v )) , ≤ u , v ≤ n .Thus, C out (T ) = C out (T ′ ) . □ Lemma 9.

Let R be the set of relations of a branch as de-scribed in Definition 4. There exists only one right deep treewithout cross products such that R n is the right most leaf, thatis, T ( R n , R n − , · · · , R ) . Proof. If R n is the right most leaf and there is no crossproduct in the query plan, then R n can only join with R n − . Thus, the right most subplan with two relations is T = T ( R n , R n − ) . Similarly, if the right most subplan is T k = T ( R n , R n − , · · · , R n − k + ) and there is no cross product, then T k can only join with R n − k . By induction, T ( R n , R n − , · · · , R ) is the only right deep tree without cross products where R n is the right most leaf. □ Lemma 10.

Cost reduction by pushing down R n : Let R be the set of relations of a branch as described in Def-inition 4. Let T = T ( X , X , · · · , X n ) be a right deep treewithout cross products for R , R , · · · , R n . Assume X k = R n for some ≤ k ≤ n . If X k − (cid:44) R n − , then T ′ = T ( X , X , · · · , X k , X k − , X k + , X k + , · · · , X n ) is a right deeptree without cross products and C cout (T ′ ) ≤ C out (T ) . Proof. Since there is no cross product in T = T ( X , X , · · · , X k − , R n , X k + , · · · , X n ) , one relation in A = { X , X , · · · , X k − } must connect to R n . Since R n − isthe only relation that connects to R n in the joingraph, R n − ∈ { X , X , · · · , X k − } . By induction, wecan show that R n − , R n − , · · · , R n − k ∈ A . Thus, A = { R n − k , R n − k + , · · · , R n − } .If X k − (cid:44) R n − , then X k − = R n − k ; otherwise, the joingraph of X , X , · · · , X k − is not connected and the subplan T ( X , X , · · · , X k − ) has cross products.Now consider the relations { X k + , X k + , · · · , X n } . Because X k + joins with { X , X , · · · , X k } = { R n , R n − , · · · , R n − k } , X k + = R n − k − . Similarly, we can show that X i = R n − i for k < i ≤ n .If we swap R n and R n − k , we get a new plan T ′ = T ( X , X , · · · , X k − , R n , R n − k , X k + , · · · , X n ) .Because { X , X , · · · , X k − } = A \ { R n − k } = { R n − k + , R n − k + , · · · , R n − } . Thus, T ′ has no crossproduct.Now we prove C out (T ′ ) ≤ C out (T ) .First, consider X i for k < i ≤ n . Since there is no changein bitvector filters, it is easy to see that C out ( X i ) is the samefor T and T ′ .ext, consider X i for 0 ≤ i < k . Since B = { X , X , · · · , X k − } = { R n − k + , · · · , R n − } , only R n − k and R n will create bitvector filters that can be pushed down to sub-plans of B . Because R n − ∈ B , no bitvector filter will bepushed down to R n . Thus, the bitvector filter created from R n is the same for T and T ′ , and the same bitvector filterwill be pushed down to R n − the same way in T and T ′ .Similarly, the bitvector filters created from and pushed downto R n − k and R n − k + are the same in T and T ′ .Thus, we have proved C out ( X i ) is the same for 0 ≤ i ≤ n .Next, we show that the intermediate join sizes in T ′ isequal to or smaller than these in T .Since T and T ′ share the same subplan T j = T ( X , X , · · · , X j ) , ≤ j ≤ k −

2, and we have shown thebitvector filters pushed down to T j is the same in T and T ′ ,the intermediate join sizes are the same in T j for both plans.Consider the cardinalities of the join S ( T k − , R n − k ) in T and S ( T k − , R n ) in T ′ . Since R { n − } ∈ A , S ( T k − , R n ) is aPKFK join. By absorption rule , | S ( T k − , R n )| = | T k − | . Since R n − k − = X k + (cid:60) A , S ( T k − , R n − k ) is not a PKFK join. By reduction property , | T k − (cid:90) R n − k | ≥ | T k − | . Thus, | T k − (cid:90) R n | = | T k − | ≤ | T k − (cid:90) R n − k | .Now consider the cardinalities for S ( T k − , R n − k , R n ) in T and S ( T k − , R n , R n − k ) in T ′ . Since the set of bitvector fil-ters from B pushed down to S ( T k − , R n − k , R n ) is the same asthose pushed down to S ( T k − , R n , R n − k ) and the join relationsare the same, | S ( T k − , R n − k , R n )| = | S ( T k − , R n , R n − k )| . Sim-ilarly, we can show that | S ( T k − , R n − k , R n , X k + , · · · , X i )| = | S ( T k − , R n , R n − k , X k + , · · · , X i )| for k + ≤ i ≤ n .Thus, C out (T ) = (cid:205) ni = C out ( X i ) + (cid:205) ni = , i (cid:44) k − , k | S ( X , · · · , X i )| + | S ( T k − , R n − k )| + | S ( T k − , R n − k , R n )| ≥ (cid:205) ni = C out ( X i ) + (cid:205) ni = , i (cid:44) k − , k | S ( X , · · · , X i )| + | S ( T k − , R n )| + | S ( T k − , R n , R n − k )| = C out (T ′ ) . □ Lemma 11.

Cost reduction by pushing down R n , R n − , · · · , R n − m : Let R be the set of relations of a branchas described in Definition 4. Let T = T ( X , X , · · · , X n ) bea right deep tree without cross products for R , R , · · · , R n .Let X k = R n , X k − = R n − , · · · , X k − m = R n − m forsome m ≤ k ≤ n . If X k − m − (cid:44) R n − m − , then T ′ = T ( X , X , · · · , X k − m − , X k − m , X k − m + , · · · , X k , X k − m − , X k + , · · · , X n ) is a right deep tree without cross products and C cout (T ′ ) ≤ C out (T ) . Proof. Similar to the proof of Lemma 10, we canshow that X k − m − = R n − k if X k − m − (cid:44) R n − m − , A = { X , X , · · · , X k − m − } = { R n − k , R n − k + , · · · , R n − m − } , and X i = R n − i for k < i ≤ n .Now consider swapping R n , R n − , · · · , R n − m with R n − k ,the resulting plan is T ′ . Similar to the proof of Lemma 10,we can show that T ′ has no cross product. Consider C out for X , X , · · · , X n . Similar to the proofof Lemma 10, we can show that C out ( X i ) is the same for X , X , · · · , X n in T and T ′ .Next, consider the intermediate join sizes. Since both T and T ′ share the same subplan T j ( X , X , · · · , X j ) , ≤ j ≤ k − m −

2, similar to the proof of Lemma 10, we can showthat T j , ≤ j ≤ k − m − T and T ′ .Now consider the cardinality of joins S (T k − m − , R k − m − ) and S (T k − m − , R n − k ) , similar to Lemma 10, we can show | S (T k − m − , R k − m − )| ≤ S (T k − m − , R n − k ) .Now consider the cardinality of joins S (T k − m − , R k − m − , R k − m ) and S (T k − m − , R n − k , R k − m − ) .Since R k − m − is a PKFK join with S (T k − m − , R k − m − ) , | S (T k − m − , R k − m − , R k − m )| = S (T k − m − , R k − m − . Similarly,since R k − m − is a PKFK join with S (T k − m − , R n − k , R k − m − ) ,we have | S (T k − m − , R n − k , R k − m − )| = | S (T k − m − , R n − k )| . Thus, | S (T k − m − , R k − m − , R k − m )| ≤ S (T k − m − , R n − k , R k − m − ) .By similar reasoning, we can showthat | S (T k − m − , R k − m − , R k − m , · · · , R j )| ≤| S (T k − m − , R n − k , R k − m − , R k − m , · · · , R j − )| , k − m − ≤ j ≤ n .Finally, we can show that | S (T k − m − , R k − m − , R k − m , · · · , R n , R n − k )| = | S (T k − m − , R n − k , R k − m − , R k − m , · · · , R n )| and | S (T k − m − , R k − m − , R k − m , · · · , R n , R n − k , R n − k − , · · · , R j )| = | S (T k − m − , R n − k , R k − m − , R k − m , · · · , R n , R n − k − , · · · , R j )| , ≤ j ≤ n − k − C out (T ) ≥ C out (T ′ ) . □ Lemma 12.

Single branch in right most leaves:

Let R be the set of relations of a snowflake query q as described inDefinition 2. Let T = T ( X , X , · · · , X n ) be a right deep treewithout cross products for q , where X , X , · · · , X n is a permu-tation of R . If X k = R , then X , X , · · · , X k − is a permutationof R i , , R i , , · · · , R i , k for some ≤ i ≤ m . Proof. Assume there exists X u = R i , j and X v = R i , j such that 0 ≤ u , v ≤ k − i (cid:44) i . Because X k = R , X u does not connect to X v by joining with X , X , · · · , X k − .Thus, there must be a cross product, which is a contradiction.Since X , X , · · · , X k − has a join condition with R , R i , ∈ { X , X , · · · , X k − } . Because T ( X , X , · · · , X k − ) hasno cross product, { X , X , · · · , X k − } = { R i , , R i , , · · · , R i , k } .Thus, X , X , · · · , X k − is a permutation of R i , , R i , , · · · , R i , k . □ Lemma 13.

Partially-ordered subtree:

Let R be the set ofrelations of a snowflake query q as described in Definition 2.Let T = T ( X , X , · · · , X n ) be a right deep tree without crossproducts for q , where X , X , · · · , X n is a permutation of R . If k = R , then X k + , X k + , · · · , X n is a partially ordered rightdeep tree of the new relation R ′ = X (cid:90) X (cid:90) · · · (cid:90) X k . Proof. By Lemma 12, X , X , · · · , X k − is a permutationof R i , , R i , , · · · , R i , k for some 1 ≤ i ≤ m . Let’s create a newrelation R ′ = Join ( X , X , · · · , X k − , R ) . For X j , k < j ≤ n ,if X j = R i , k + , R i , k → X j and thus R ′ → X j ; if X j = R u , , R → X j and thus X j → R ′ ; if X j = R u , v , v >

1, then thereexists R u , v − ∈ { X k + , X k + , · · · , X n } such that R u , v − → X j . Thus, { R ′ , X k + , X k + , · · · , X n } is a snowflake query. ByLemma 6, and X k + , X k + , · · · , X n is a partially ordered rightdeep tree of the new snowflake query. □ Lemma 14.

Cost reduction by consolidating asingle branch:

Let R be the set of relations ofa snowflake query q as described in Definition 2.Let T = T ( X , X , · · · , X k − , R , X k + , · · · , X n ) be aright deep tree without cross products for q , where X , X , · · · , X k − is a permutation of R i , , R i , , · · · , R i , k for some ≤ i ≤ m , ≤ k ≤ n i − . Then thereexists a right deep tree without cross products T ′ = T ( X , X , · · · , X k − , R i , k + , R i , k + , · · · , R i , n i , R , Y , Y , · · · , Y n − n i − ) for q such that C out (T ′ ) ≤ C out (T ) . Proof. By Lemma 13, T is a partially-ordered subtree. Let T p = T ( X , X , · · · , X k − , R , R i , k + , R i , k + , · · · , R i , n i , Y , Y , · · · , Y n − n i − ) , where Y , Y , · · · , Y n − n i − is a permutationof A = { X k + , X k + , · · · , X n } \ { R i , k + , R i , k + , · · · , R i , n i } ,and Y , Y , · · · , Y n − n i − is partially ordered. By Theorem 8, C out (T p ) = C out (T ) .Now consider T ′ = T ( X , X , · · · , X k − , R i , k + , R i , k + , · · · , R i , n i , R , Y , Y , Y n − n i − ) . Let R ′ = R /( Y , Y , · · · , Y n − n i − ) . Since X , X , · · · , X k − is a permutation of { R i , , R i , , · · · , R i , k } , joining { X , X , · · · , X k − , R i , k + , R i , k + , · · · , R i , n i , R ′ } is a branchof a snowflake. By Lemma 11, C cout ( T ( X , X , · · · , X k − , R i , k + , R i , k + , · · · , R i , n i , R ′ )) ≤ C cout ( T ( X , X , · · · , X k − , R ′ , R i , k + , R i , k + , · · · , R i , n i )) .Consider T ′ and T p . Because T p is a partially-ordered subtree, C out (T p ) = C out ( T ( X , X , · · · , X k − , R ′ , R i , k + , R i , k + ) , · · · , R i , n i ) + (cid:205) n − n i − j = C out ( Y j ) + ( n − n i − ) · | S ( R , R i , , R i , , · · · , R i , n i , Y , Y , · · · , Y n − n i − )| . Thus, C out (T p ) ≥ C out ( T ( X , X , · · · , X k − , R i , k + , R i , k + , · · · , R i , n i , R ′ )) + ( n − n i − ) · | S ( R , R i , , R i , , · · · , R i , n i , Y , Y , · · · , Y n − n i − )| = C out (T ′ ) .Thus, C out (T ′ ) ≤ C out (T p ) = C out (T ) . □ REFERENCES [1] 2012.

TPC-DS

Inform. Process. Lett.

Proceedings of the 2014 ACMSIGMOD International Conference on Management of Data (SIGMOD’14) .Association for Computing Machinery, New York, NY, USA, 373–384.https://doi.org/10.1145/2588555.2595640[4] M. M. Astrahan, M. W. Blasgen, D. D. Chamberlin, K. P. Eswaran,J. N. Gray, P. P. Griffiths, W. F. King, R. A. Lorie, P. R. McJones, J. W.Mehl, and et al. 1976. System R: Relational Approach to DatabaseManagement.

ACM Trans. Database Syst.

1, 2, 97–137. https://doi.org/10.1145/320455.320457[5] Catriel Beeri and Raghu Ramakrishnan. 1991. On the power of magic.

The journal of logic programming

10, 3-4, 255–299.[6] Philip A. Bernstein and Dah-Ming W. Chiu. 1981. Using Semi-Joinsto Solve Relational Queries.

J. ACM

28, 1, 25–40. https://doi.org/10.1145/322234.322238[7] Burton H. Bloom. 1970. Space/Time Trade-Offs in Hash Coding withAllowable Errors.

Commun. ACM

13, 7, 422–426. https://doi.org/10.1145/362686.362692[8] Kjell Bratbergsengen. 1984. Hashing Methods and Relational AlgebraOperations. In

Tenth International Conference on Very Large Data Bases,August 27-31, 1984, Singapore, Proceedings . Morgan Kaufmann, 323–333.[9] Chee-Yong Chan and Yannis E. Ioannidis. 1998. Bitmap Index De-sign and Evaluation. In

Proceedings of the 1998 ACM SIGMOD In-ternational Conference on Management of Data (SIGMOD’98) . Asso-ciation for Computing Machinery, New York, NY, USA, 355–366.https://doi.org/10.1145/276304.276336[10] M. . Chen and P. S. Yu. 1992. Interleaving a join sequence with semijoinsin distributed query processing.

IEEE Transactions on Parallel andDistributed Systems

3, 5, 611–621. https://doi.org/10.1109/71.159044[11] Ming-Syan Chen, Hui-I Hsiao, and Philip S. Yu. 1993. Applying HashFilters to Improving the Execution of Bushy Trees. In . Morgan Kaufmann, 505–516.[12] Ming-Syan Chen, Hui-I Hsiao, and Philip S. Yu. 1997. On ApplyingHash Filters to Improving the Execution of Multi-Join Queries.

VLDBJ.

6, 2, 121–131. https://doi.org/10.1007/s007780050036[13] Dinesh Das, Jiaqi Yan, Mohamed Zaït, Satyanarayana R. Valluri, NiravVyas, Ramarajan Krishnamachari, Prashant Gaharwar, Jesse Kamp, andNiloy Mukherjee. 2015. Query Optimization in Oracle 12c DatabaseIn-Memory.

PVLDB

8, 12, 1770–1781. https://doi.org/10.14778/2824032.2824074[14] Bailu Ding, Surajit Chaudhuri, and Vivek Narasayya. 2020. Bitvector-aware query optimization for decision support queries. In

Proceedingsof the 2020 ACM SIGMOD International Conference on Management ofData .[15] Bin Fan, Dave G. Andersen, Michael Kaminsky, and Michael D. Mitzen-macher. 2014. Cuckoo Filter: Practically Better Than Bloom. In

Proceedings of the 10th ACM International on Conference on Emerg-ing Networking Experiments and Technologies (CoNEXT’14) . Associa-tion for Computing Machinery, New York, NY, USA, 75–88. https://doi.org/10.1145/2674005.2674994[16] P. Fender, G. Moerkotte, T. Neumann, and V. Leis. 2012. Effectiveand Robust Pruning for Top-Down Join Enumeration Algorithms. In . 414–425.https://doi.org/10.1109/ICDE.2012.27[17] C. A. Galindo-Legaria, T. Grabs, S. Gukal, S. Herbert, A. Surna, S.Wang, W. Yu, P. Zabback, and S. Zhang. 2008. Optimizing Star JoinQueries for Data Warehousing in Microsoft SQL Server. In . 1190–1199. https://doi.org/10.1109/ICDE.2008.449752818] Goetz Graefe. 1993. Query Evaluation Techniques for Large Databases.

ACM Comput. Surv.

25, 2, 73–169. https://doi.org/10.1145/152610.152611[19] Goetz Graefe. 1995. The Cascades framework for query optimization.

IEEE Data Eng. Bull.

18, 3, 19–29.[20] G. Graefe and W. J. McKenna. 1993. The Volcano optimizer generator:extensibility and efficient search. In

Proceedings of IEEE 9th Interna-tional Conference on Data Engineering . 209–218. https://doi.org/10.1109/ICDE.1993.344061[21] Hui-I Hsiao, Ming-Syan Chen, and Philip S. Yu. 1994. On ParallelExecution of Multiple Pipelined Hash Joins. In

Proceedings of the 1994ACM SIGMOD International Conference on Management of Data (SIG-MOD’94) . Association for Computing Machinery, New York, NY, USA,185–196. https://doi.org/10.1145/191839.191879[22] Nikos Karayannidis, Aris Tsois, Timos K. Sellis, Roland Pieringer,Volker Markl, Frank Ramsak, Robert Fenk, Klaus Elhardt, and RudolfBayer. 2002. Processing Star Queries on Hierarchically-Clustered FactTables. In

Proceedings of 28th International Conference on Very LargeData Bases, VLDB 2002, Hong Kong, August 20-23, 2002 . Morgan Kauf-mann, 730–741. https://doi.org/10.1016/B978-155860869-6/50070-6[23] T. Lahiri, S. Chavan, M. Colgan, D. Das, A. Ganesh, M. Gleeson, S. Hase,A. Holloway, J. Kamp, T. Lee, J. Loaiza, N. Macnaughton, V. Marwah, N.Mukherjee, A. Mullick, S. Muthulingam, V. Raja, M. Roth, E. Soylemez,and M. Zait. 2015. Oracle Database In-Memory: A dual format in-memory database. In . 1253–1258. https://doi.org/10.1109/ICDE.2015.7113373[24] Harald Lang, Thomas Neumann, Alfons Kemper, and Peter A. Boncz.2019. Performance-Optimal Filtering: Bloom overtakes Cuckoo atHigh-Throughput.

PVLDB

12, 5, 502–515. https://doi.org/10.14778/3303753.3303757[25] Viktor Leis, Bernhard Radke, Andrey Gubichev, Atanas Mirchev, Pe-ter A. Boncz, Alfons Kemper, and Thomas Neumann. 2018. Queryoptimization through the looking glass, and what we found run-ning the Join Order Benchmark.

VLDB J.

27, 5, 643–668. https://doi.org/10.1007/s00778-017-0480-7[26] Guido Moerkotte and Thomas Neumann. 2006. Analysis of TwoExisting and One New Dynamic Programming Algorithm for theGeneration of Optimal Bushy Join Trees without Cross Products. In

Proceedings of the 32nd International Conference on Very Large DataBases, Seoul, Korea, September 12-15, 2006 . ACM, 930–941. http://dl.acm.org/citation.cfm?id=1164207[27] Guido Moerkotte and Thomas Neumann. 2008. Dynamic Program-ming Strikes Back. In

Proceedings of the 2008 ACM SIGMOD Inter-national Conference on Management of Data (SIGMOD’08) . Associa-tion for Computing Machinery, New York, NY, USA, 539–552. https://doi.org/10.1145/1376616.1376672[28] Thomas Neumann. 2009. Query Simplification: Graceful Degradationfor Join-Order Optimization. In

Proceedings of the 2009 ACM SIGMODInternational Conference on Management of Data (SIGMOD’09) . As-sociation for Computing Machinery, New York, NY, USA, 403–414.https://doi.org/10.1145/1559845.1559889[29] Thomas Neumann. 2009. Query Simplification: Graceful Degradationfor Join-Order Optimization. In

Proceedings of the 2009 ACM SIGMODInternational Conference on Management of Data (SIGMOD’09) . As-sociation for Computing Machinery, New York, NY, USA, 403–414.https://doi.org/10.1145/1559845.1559889[30] Thomas Neumann and César A. Galindo-Legaria. 2013. Taking theEdge off Cardinality Estimation Errors using Incremental Execution.In

Datenbanksysteme für Business, Technologie und Web (BTW), 15.Fachtagung des GI-Fachbereichs "Datenbanken und Informationssys-teme" (DBIS), 11.-15.3.2013 in Magdeburg, Germany. Proceedings (LNI) ,Vol. P-214. GI, 73–92. https://dl.gi.de/20.500.12116/17356 [31] Kiyoshi Ono and Guy M. Lohman. 1990. Measuring the Complexity ofJoin Enumeration in Query Optimization. In . Morgan Kaufmann, 314–325.[32] Felix Putze, Peter Sanders, and Johannes Singler. 2007. Cache-, hash-and space-efficient bloom filters. In

International Workshop on Experi-mental and Efficient Algorithms . Springer, 108–121.[33] Praveen Seshadri, Joseph M. Hellerstein, Hamid Pirahesh, T. Y. CliffLeung, Raghu Ramakrishnan, Divesh Srivastava, Peter J. Stuckey,and S. Sudarshan. 1996. Cost-Based Optimization for Magic: Alge-bra and Implementation. In

Proceedings of the 1996 ACM SIGMODInternational Conference on Management of Data (SIGMOD’96) . As-sociation for Computing Machinery, New York, NY, USA, 435–446.https://doi.org/10.1145/233269.233360[34] David Simmen, Eugene Shekita, and Timothy Malkemus. 1996. Fun-damental Techniques for Order Optimization. In

Proceedings of the1996 ACM SIGMOD International Conference on Management of Data(SIGMOD’96) . Association for Computing Machinery, New York, NY,USA, 57–67. https://doi.org/10.1145/233269.233320[35] Mohamed A. Soliman, Lyublena Antova, Venkatesh Raghavan, Amr El-Helw, Zhongxian Gu, Entong Shen, George C. Caragea, Carlos Garcia-Alvarado, Foyzur Rahman, Michalis Petropoulos, and et al. 2014. Orca:A Modular Query Optimizer Architecture for Big Data. In

Proceedingsof the 2014 ACM SIGMOD International Conference on Management ofData (SIGMOD’14) . Association for Computing Machinery, New York,NY, USA, 337–348. https://doi.org/10.1145/2588555.2595637[36] Patrick Valduriez and Georges Gardarin. 1984. Join and Semijoin Algo-rithms for a Multiprocessor Database Machine.

ACM Trans. DatabaseSyst.

9, 1, 133–161. https://doi.org/10.1145/348.318590[37] Andreas Weininger. 2002. Efficient Execution of Joins in a Star Schema.In

Proceedings of the 2002 ACM SIGMOD International Conference onManagement of Data (SIGMOD’02) . Association for Computing Ma-chinery, New York, NY, USA, 542–545. https://doi.org/10.1145/564691.564754[38] Jianqiao Zhu, Navneet Potti, Saket Saurabh, and Jignesh M. Patel. 2017.Looking Ahead Makes Query Plans Robust.