RITA: An Index-Tuning Advisor for Replicated Databases
Quoc Trung Tran, Ivo Jimenez, Rui Wang, Neoklis Polyzotis, Anastasia Ailamaki
aa r X i v : . [ c s . D B ] J u l RITA: An Index-Tuning Advisor for Replicated Databases
Quoc Trung Tran
UC Santa [email protected]
Ivo Jimenez
UC Santa [email protected]
Rui Wang
UC Santa [email protected]
Neoklis Polyzotis
UC Santa [email protected]
Anastasia Ailamaki
École Polytechnique ´Fedérale de Lausannenatassa@epfl.ch
ABSTRACT
Given a replicated database, a divergent design tunes the indexesin each replica differently in order to specialize it for a specificsubset of the workload. This specialization brings significant per-formance gains compared to the common practice of having thesame indexes in all replicas, but requires the development of newtuning tools for database administrators. In this paper we intro-duce RITA ( R eplication-aware I ndex T uning A dvisor), a noveldivergent-tuning advisor that offers several essential features notfound in existing tools: it generates robust divergent designs thatallow the system to adapt gracefully to replica failures; it com-putes designs that spread the load evenly among specialized repli-cas, both during normal operation and when replicas fail; it mon-itors the workload online in order to detect changes that require arecomputation of the divergent design; and, it offers suggestions toelastically reconfigure the system (by adding/removing replicas oradding/dropping indexes) to respond to workload changes. The keytechnical innovation behind RITA is showing that the problem ofselecting an optimal design can be formulated as a Binary IntegerProgram (BIP). The BIP has a relatively small number of variables,which makes it feasible to solve it efficiently using any off-the-shelf linear-optimization software. Experimental results demon-strate that RITA computes better divergent designs compared toexisting tools, offers more features, and has fast execution times.
1. INTRODUCTION
Database replication is used heavily in distributed systems anddatabase-as-a-service platforms (e.g., Amazon’s Relational DatabaseService [1] or Microsoft SQL Azure [2]), to increase availabil-ity and to improve performance through parallel processing. Thedatabase is typically replicated across several nodes, and replicasare kept synchronized (eagerly or lazily) when updates occur sothat incoming queries can be evaluated on any replica.
Divergent-design tuning [5] represents a new paradigm to tuneworkload performance over a replicated database. A divergent de-sign leverages replication as follows: it specializes each replica to aspecific subset of the workload by installing indexes that are partic-ularly beneficial for the corresponding workload statements. Thus,
Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a fee.Copyright 20XX ACM X-XXXXX-XX-X/XX/XX ...$10.00. queries can be evaluated more efficiently by being routed to a spe-cialized replica. As shown in a previous study, a divergent designbrings significant performance improvements when compared to auniform design that uses the same indexes in all replicas: queriesare executed faster due to replica specialization (up to 2x improve-ment on standard benchmarks), but updates as well become sig-nificantly more efficient (more than 2x improvement) since fewerindexes need to be installed per replica.To reap the benefits of divergent designs in practice, DB admin-istrators need new index-tuning advisors that are replication-aware.The original study [5] introduces an advisor called D
IVG D ESIGN ,which creates specialized designs per replica but has severe limita-tions that restrict its usefulness in practice. Firstly, D
IVG D ESIGN assumes that replicas are always operational. Replica failures, how-ever, are common in real systems, and the resulting workload redis-tribution may cause queries to be routed to low-performing repli-cas, with predictably negative effects on the overall system per-formance. An effective advisor should generate robust divergentdesigns that allow the system to adapt gracefully to replica failures.Secondly, D
IVG D ESIGN ignores the effect of specialization to eachreplica’s load, and can therefore incur a skewed load distribution inthe system. Our experiments suggest that D
IVG D ESIGN can causecertain replicas to be twice as loaded as others. A good advisorshould take the replica load into account, and generate divergentdesigns that provide the benefits of specialization while maintain-ing a balanced load distribution. Lastly, D
IVG D ESIGN targets astatic system where the database workload and the number of repli-cas are assumed to remain unchanged. A replicated database sys-tem, however, is typically volatile: the workload may change overtime, and in response the DBA may wish to elastically reconfig-ure the system by expanding or shrinking the set of replicas andby incrementally adding or dropping indexes at different replicas.A replication-aware advisor should alert the DBA when a workloadchange necessitates retuning the divergent design, and also help theDBA evaluate options for changing the design.The limitations of D
IVG D ESIGN stem from the fact that it in-ternally employs a conventional index-tuning advisor, e.g., DB2’s db2advis or the index advisor of MS SQL Server, which is notsuitable for modeling and solving the aforementioned issues. Mod-ifying D
IVG D ESIGN to address its limitations would require a non-trivial redesign of the advisor. A more general question is whetherit is even feasible to reap the performance benefits demonstratedin [5] and at the same time maintain a balanced load and the abilityto adapt gracefully to failures. Our work shows that this is indeedfeasible but requires the development of a new type of index-tuningadvisor that is replication-aware.
Contributions.
In this paper, we introduce a novel index advi-sor termed RITA (Replication-aware Index Tuning Advisor) that1rovides DBAs with a powerful tool for divergent index tuning. In-stead of relying on conventional techniques for index tuning, RITAis a new type of index advisor that is designed from the ground upto take into account replication and the unique characteristics ofdivergent designs. RITA’s foundation is a novel reduction of theproblem of divergent design tuning to Binary Integer Programming(BIP). The BIP formulation allows RITA to employ an off-the-shelf linear optimization solver to compute near-optimal designsthat satisfy complex constraints (e.g., even load distribution or ro-bustness to failures). Compared to D
IVG D ESIGN , RITA offersricher tuning functionality and is able to compute divergent designsthat result in significantly better performance.More concretely, the contributions of our work can be summa-rized as follows: (1.)
To make divergent designs suitable for the characteristics ofreal-world systems, we introduce a generalized version of the prob-lem of divergent design tuning that has two important features: ittakes into account the probability of replica failures and their effecton workload performance; and, it allows for an expanded class ofconstraints on the computed divergent design and in particular con-straints on global system-properties, e.g., maintaining an even loaddistribution (Section 3). (2.)
We prove that, under realistic assumptions about the underly-ing system, the generalized tuning problem can be formulated as acompact Binary Integer Program (BIP), i.e., a linear-optimizationproblem with a relatively small number of binary variables. Theimplication is that we can use an off-the-shelf solver to efficientlycompute a (near-)optimal divergent design that also satisfies anygiven constraints (Section 4). (3.)
We propose RITA as a new index-tuning tool that leveragesthe previous theoretical result to implement a unique set of features.RITA allows the DBA to initially tune the divergent design of thesystem using a training workload. Subsequently, RITA continu-ously analyzes the incoming workload and alerts the DBA if a re-tuning of the divergent design could lead to substantial performanceimprovements. The DBA can then examine how to elastically adaptthe divergent design to the changed workload, e.g., by expand-ing/shrinking the set of replicas, incrementally adding/removingindexes, or changing how queries are distributed across replicas.Internally, RITA translates the DBA’s requests to BIPs that aresolved efficiently by a linear-optimization solver. In fact, RITA of-ten returns its answers in seconds, thus facilitating an exploratoryapproach to index tuning (Section 5). (4.)
We perform an extensive experimental study to validate theeffectiveness of RITA as a tuning advisor. The results show thatthe designs computed by RITA can improve system performanceby up to a factor of four compared to the standard uniform designthat places the same indexes on all replicas. Moreover, RITA out-performs D
IVG D ESIGN by up to a factor of three in terms of theperformance of the computed divergent designs, while supportinga larger class of constraints (Section 6).Overall, RITA provides a positive answer to our previously statedquestion: a divergent design can bring significant performance ben-efits while maintaining important properties such as a balancedload distribution and tolerance to failures. Consequently, divergentdesign advisors can be practically employed on real systems andguide further development of tuning tools. The underlying theo-retical results (problem definition and BIP formulation) are alsosignificant, as they expand on the previous work on single-systemtuning [6] and demonstrate a wider applicability of Binary IntegerProgramming to index-tuning problems.
2. RELATED WORK
Index tuning.
There has been a long line of research studies onthe problem of tuning the index configuration of a single DBMS(e.g., [3, 6, 18]). These methods analyze a representative workloadand recommend an index configuration that optimizes the evalu-ation of the workload according to the optimizer’s estimates. Arecent study [6] has introduced the COPHY index advisor that out-performs state-of-the-art commercial and research techniques byup to an order of magnitude in terms of both solution quality andtotal execution time. Both RITA and COPHY leverage the sameunderlying principle of linear composability, which we will defineand discuss extensively in Section 4.1, in order to cast the index-tuning problem as a compact, efficiently-solvable Binary IntegerProgram (BIP). However, COPHY targets the conventional index-tuning problem where the goal is to compute a single index config-uration for a single-node system. This problem scenario is muchsimpler than what we consider in our work, where there are severalnodes in the system, each can carry a different index configuration,queries have to be distributed in a balanced fashion and the systemmust recover gracefully from failures. Leveraging the principle oflinear composability in this generalized problem scenario is one ofthe key contributions of our work.
Physical data organization on replicas.
Previous works also con-sidered the idea of diverging the physical organization of replicateddata. The technique of Fractured Mirrors [12] builds a mirroreddatabase that stores its base data in a different physical organiza-tions on disk (specifically, in a row-based and a column-based or-ganization). Similarly, Distorted Mirrors [14] presents logically butnot physically identical mirror disks for replicated data. However,they do not consider how to tune the indexes for each mirror.There are recent works [9, 8] that explore different physical de-signs for different replicas in Hadoop context. Specifically, TRO-JAN HDFS [9] organizes each physical replica of an HDFS blockin a different data layout, where each data layout corresponds to adifferent vertical partitioning. Likewise, HAIL [8] organizes eachphysical replica of an HDFS block in a different sorted order, whichessentially amounts to exactly one clustered index per replica. Ourwork differs from these works in several aspects. First, these worksdo not consider the problem of spreading the load evenly amongspecialized replicas that we are considering. Second, performanceis very sensitive to failures, because the tuning options consid-ered by these papers lead to replicas that are highly specialized forsubsets of the workload. When a replica fails, the correspondingqueries will be rerouted to replicas with little provision to handlethe rerouted workload, and hence performance may suffer. In con-trast, we focus on divergent designs that directly take into accountthe possibility of replicas failing, thus offering more stable perfor-mance when one or more replicas become unavailable. Third, [8]creates one index per replica which restricts the extent to whichwe can tune each replica to the workload. Our methods do nothave any such built-in limitations and are only restricted by con-figurable constraints on the materialized indexes (e.g., total spaceconsumption, or total maintenance cost). Fourth, our work can re-turn a set of possible designs that represent trade-off points withina multi-dimensional space, e.g., between workload evaluation costand design-materialization cost. These works do not support thisfunctionality.The original study on divergent-design tuning [5] introduced theD
IVG D ESIGN advisor which is the direct competitor to our pro-posed RITA advisor. However, as we discussed in Section 1, D
IVG D ESIGN is fundamentally limited by the functionality of the underlying single-system advisor, and cannot support many essential tuning function-alities as RITA.2 . DIVERGENT DESIGN TUNING: PROB-LEM STATEMENT
In this section, we formalize the problem of divergent designtuning. The problem statement borrows several concepts from theoriginal problem statement in [5] but also provides a non-trivialgeneralization. A comparison to the original study appears at theend of the section.
We consider a database comprising tables T ,..., T n . An indexconfiguration X is a set of indexes defined over the database tables.We assume that X is a subset of a universe of candidate indexes S = S ∪ ··· ∪ S n , where S i represents the set of candidate in-dexes on table T i . Each S i represents a very large set of indexes andcan be derived manually by the DBA or by mining the query logs.We do not place any limitations on the indexes regarding their typeor the type or count of attributes that they cover, except that eachindex in X is defined on exactly one table (i.e., no join indexes).We use cost ( q , X ) to denote the cost of evaluating query q as-suming that X is materialized. The cost function can be evalu-ated efficiently in modern systems (i.e., without materializing X )using a what-if optimizer [4]. We define cost ( u , X ) similarly foran update statement u , except that in this case we also considerthe overhead of maintaining the indexes in X due to the update.Following common practice [13, 6], we break the execution of u into two orthogonal components: (1) a query shell q sel that selectsthe tuples to be updated, and (2) an update shell that performs theactual update on base tables and also updates any affected mate-rialized indexes. Hence, the total cost of an update statement canbe expressed as cost ( u , X ) = cost ( q sel , X )+ P a ∈ X ucost ( u , a )+ c u ,where ucost ( u , a ) is the cost to update index a with the effects ofthe update and can be estimated again using the what-if optimizer.The constant c u is simply the cost to update the base data whichdoes not depend on X .We consider a database that is fully replicated in N nodes, i.e.,each node i ∈ [ , N ] holds a full copy of the database. The replicasare kept synchronized by forwarding each database update to allreplicas (lazily or eagerly). At the same time, a query can be evalu-ated by any replica. Since we are dealing with a multi-node system,we have to take into account the possibility of replicas failing. Weuse α to denote the probability of at least one replica failing. Set-ting this parameter can be done once in the beginning to the bestof the DBA’s ability and then it can be updated with easy statis-tics as the system is used (you adjust it based on the failure rateyou see). To simplify further notation, we will assume that at mostone replica can fail at any point in time. The extension to multiplereplicas failing together is straightforward for our problem.We define W = Q ∪ U as a workload comprising a set Q of querystatements and a set U of update statements. Workload W serves asthe representative workload for tuning the system. As is typical inthese cases, we also define a weight function f : W → ´ such that f ( x ) corresponds to the importance of query or update statement x in W . The input workload and associated weights can be hand-crafted by the DBA or they can be obtained automatically, e.g., byanalyzing the query logs of the database system. At a high level, a divergent design allows each replica to havea different index configuration, tailored to a particular subset ofthe workload. To evaluate the query workload Q , an ideal strategywould route each q ∈ Q to the replica that minimizes the execu-tion cost for q . However, this ideal routing may not be feasible forseveral reasons, e.g., the replica may not be reachable or may be overloaded. Hence, the idea is to have several low-cost replicasfor q , so as to provide some flexibility for query evaluation. Forthis purpose, we introduce a parameter m ∈ [ , N ] , which we term routing multiplicity factor . Informally, for every query q ∈ Q , a di-vergent design specifies a set of m low-cost replicas that q can berouted to. The value of m is assumed to be set by the administratorwho is responsible for tuning the system: m = m = N provides for maximum flexibility;1 < m < N achieves some trade-off between the two extremes.Formally, we define a divergent design as a pair ( I , h ) . Thefirst component I = ( I ,..., I N ) is an N -tuple, where I r is the in-dex configuration of replica r ∈ [ , N ] . The second component h = ( h , h , ··· , h N ) is a ( N + ) -tuple of routing functions . Specif-ically, h () is a function over queries such that h ( q ) specifies theset of m replicas to which q can be routed when all replicas are op-erational (i.e., there are no failures). Intuitively, h ( q ) indicates thereplicas that can evaluate q at low cost while respecting other con-straints (e.g., bounding load skew among replicas, which we discuslater), and is meant to serve as a hint to the runtime query sched-uler. Therefore, a key requirement is that h () can be evaluatedon any query q and not just the queries in the training workload.The remaining functions h ,..., h N have a similar functionality butcover the case when replicas fail: h j () , for j ∈ [ , N ] , specifies howto route each query when replica j has failed and is not reachable.Notice that in this case there may be fewer than m replicas in h j ( q ) for any q ∈ Q if the DBA has originally specified m = N .In order to quantify the goodness of a divergent design, we firstuse a metric that captures the performance of the workload underthe normal operation when no running replica fails as follows. TotalCost ( I , h ) = X q ∈ Q X r ∈ h ( q ) f ( q ) m cost ( q , I r ) + X u ∈ U X i ∈ [ , N ] f ( u ) cost ( u , I i ) The second term simply captures the cost to propagate each up-date u ∈ U to each replica in the system. The first summation cap-tures the cost to evaluate the query workload Q . We assume that q is routed uniformly among its m replicas in h ( q ) , and hence theweight of q is scaled by 1 / m for each replica. The intuition behindthe TotalCost ( I , h ) metric is that it captures the ability of the diver-gent design to achieve both replica specialization and flexibility inload balancing with respect to m .To capture the case of failures, we define FTotalCost ( I , h , j ) asthe performance of the workload when replica j ∈ [ , N ] fails: FTotalCost ( I , h , j ) = X q ∈ Q X r ∈ h j ( q ) f ( q ) max { m , N − } cost ( q , I r ) + X u ∈ U X i ∈{ , ··· , N }−{ j } f ( u ) cost ( u , I i ) The expression for
FTotalCost ( I , h , j ) is similar to TotalCost ( I , h ) ,except that, since replica j is unavailable, the update cost on replica j is discarded and routing function h j is used instead of h .We quantify the goodness of a divergent design ( I , h ) based onthe expected cost of the workload, denoted as ExpTotalCost ( I , h ) ,by combining TotalCost ( I , h ) and FTotalCost ( I , h , j ) weighted ap-propriately. Recall that α is a DBA-specified probability that a fail-ure will occur. It follows that ( − α ) is the probability that allreplicas are operational and hence the performance of the work-load is computed by TotalCost ( I , h ) . Conversely, the probability ofa specific replica j failing is α / N , assuming that all replicas can3ail independently with the same probability. In that case, the costof workload evaluation is FTotalCost ( I , h , j ) . Putting everythingtogether, we obtain the following definition for the expected work-load cost: ExpTotalCost ( I , h ) = ( − α ) · TotalCost ( I , h ) + X j ∈ [ , N ] α N FTotalCost ( I , h , j ) Our assumption so far is that at most one replica can be inoper-ational at any point in time. The extension to concurrent failuresis straightforward. All that is needed is extending h with routingfunctions for combinations of failed replicas, and then extendingthe expression of ExpTotalCost ( I , h ) with the corresponding costterms and associated probabilities.We are now ready to formally define the problem of D ivergent D esign T uning, referred to as DDT .P ROBLEM (Divergent Design Tuning - DDT) We are givena replicated database with N replicas, a workload W = Q ∪ U, acandidate index-set S , a set of constraints C, a routing multiplicityfactor m, and a probability of failure α . The goal is to compute adivergent design ( I , h ) that employs indexes in S , satisfies the con-straints in C, and ExpTotalCost ( I , h ) is minimal among all feasibledivergent designs. ✷ Constraints in
DDT . The set of constraints C enables the DBA tocontrol the space of divergent designs considered by the advisor.An intra-replica constraint specifies some desired property that islocal to a replica. Examples include the following: • The size of I j in I is within a storage-space budget. • Indexes in I j must have specific properties, e.g., no index can bemore than 5-columns wide, or the count of multi-key indexes isbelow a limit. • The cost to update the indexes in I j is below a threshold.Conversely, an inter-replica constraint specifies some property thatinvolves all the replicas. Examples include the following: • If ( I c , h c ) represents the current divergent design of the system,then ExpTotalCost ( I , h ) must improve on ExpTotalCost ( I c , h c ) by at least some percentage. • The total cost to materialize ( I , h ) (i.e., to build each I j in eachreplica) must be below some threshold. • The load skew among replicas must be below some threshold.(We discuss this constraint in more detail shortly.)We will formalize later the precise class of constraints C that we cansupport in RITA. The goal is to provide support for a large classof practical constraints, while retaining the ability to find effectivedesigns efficiently.Bounding load skew is a particularly important inter-replica con-straint that we examine in our work. The replica-specialization im-posed by a divergent design means that each replica may receive adifferent subset of the workload, and hence a different load. The ExpTotalCost () metric does not take into account these differentloads, which means that minimizing workload cost may actuallylead to a high skew in terms of load distribution. Our experimentsverify this conjecture, showing that an optimal divergent design interms of ExpTotalCost () can cause loads at different replicas todiffer by up to a factor of two. This situation, which is clearlydetrimental for good performance in a distributed setting, can beavoided by including in C a constraint on the load skew among replicas. More concretely, the load of replica j under normal oper-ation can be computed as: load ( I , h , j ) = X q ∈ Q ∧ j ∈ h ( q ) f ( q ) m cost ( q , I j ) + X u ∈ U f ( u ) cost ( u , I j ) We say that design ( I , h ) has load skew τ ≥ load ( I , h , r ) ≤ ( + τ ) · load ( I , h , j ) for any 1 ≤ r = j ≤ N . A low value is desirablefor τ , as it implies that ( I , h ) keeps the different replicas relativelybalanced.We can define a load-skew constraint for the case of failures inexactly the same way. Specifically, we define fload ( I , h , j , f ) as theload of replica j when replica f fails. The formula of fload ( I , h , j , f ) is similar to that of load ( I , h , j ) except that h is replaced by h f .The constraint then specifies that fload ( I , j , f ) ≤ ( + τ ′ ) fload ( I , h , r , f ) for any valid choice of j , r , f and a skew factor τ ′ ≥ not the case, i.e., it ispossible to compute divergent designs that exhibit both good per-formance and a low skew factor. Theoretical Analysis.
Computing the optimal divergent designimplies computing a partitioning of the workload to replicas andan optimal index configuration per replica. Not surprisingly, theproblem is computationally hard, as formalized in the followingtheorem. The proof is provided in Appendix A.T
HEOREM It is not possible to compute an optimal solutionto DDT in polynomial time unless P = NP.
The formulation of
DDT expands on the original problem state-ment in [5] in several non-trivial ways. First,
DDT incorporates theexpected cost under the case of failures into the objective function,whereas failures were completely ignored in [5]. Second, our for-mulation allows a much richer set of constraints C compared to theoriginal study which considered solely intra-replica constraints. Asdiscussed earlier, the omission of such constraints may lead to di-vergent designs with undesirable effects on the overall system, e.g.,the load skew issue that we discussed earlier. Finally, the originalproblem statement imposed a restriction for h ( q ) to correspond tothe m replicas with the least evaluation cost for q , that is, ∀ q ∈ Q and ∀ i , j ∈ [ , N ] such that i ∈ h ( q ) and j / ∈ h ( q ) it must be that cost ( q , I i ) ≤ cost ( q , I j ) . We remove this restriction in our formula-tion in order to explore a larger space of divergent designs, whichis particularly important in light of the richer class of constraintsthat we consider.
4. DIVERGENT DESIGN TUNING AS BI-NARY INTEGER PROGRAMMING
In this section, we show that the problem of Divergent DesignTuning (
DDT ) can be reduced to a
Binary Integer Program (BIP) that contains a relatively small number of variables. The implica-tion is that we can leverage several decades of research in linear-optimization solvers in order to efficiently compute near-optimaldivergent designs. Reliance on these off-the-shelf solvers bringsother important benefits as well, e.g., simpler implementation andhigher portability of the index advisor, or the ability to operate in“any-time” mode where the DBA can interrupt the tuning session atany point in time and obtain the best design computed thus far. We4iscuss these features in more detail in Section 5, when we describethe architecture of RITA.The remainder of the section presents the technical details of thereduction. We first review some basic concepts for fast what-if opti-mization , which forms the basis for the development of our results.We then present the reduction for a simple variant of
DDT and thengeneralize to the full problem statement.
What-if optimization is a principled method to estimate cost ( q , X ) and cost ( u , X ) for any q ∈ Q , u ∈ U and index set X , but it remainsan expensive operation that can easily become the bottleneck inany index-tuning tool. To mitigate the high overhead of what-if op-timization, recent studies have developed two techniques for fastwhat-if optimization, termed INUM [11] and C-PQO [3] respec-tively, that can be used as drop-in replacements for a what-if opti-mizer. In what follows, we focus on INUM but note that the sameprinciples apply for C-PQO.We first introduce some necessary notation. A configuration A ⊆ S is called atomic [11] if A contains at most one index from each S i . We represent A as a vector with n elements, where A [ i ] is anindex from S i or a symbol SCAN i indicating that no index of S i isselected. For an arbitrary index set X , we use atom ( X ) to denotethe set of atomic configurations in X . To simplify presentation, weassume that a query q references a specific table T i with at most onetuple variable. The extension to the general case is straightforwardat the expense of complicated notation.For each query q , INUM makes a few carefully selected calls tothe what-if optimizer in order to compute a set of template plans ,denoted as TPlans ( q ) . A template plan p ∈ TPlans ( q ) is a physicalplan for q except that all access methods (i.e., the leaf nodes of theplan) are substituted by “slots”. Given a template p ∈ TPlans ( q ) and an atomic index configuration A , we can instantiate a concretephysical execution plan by instantiating each slot with the corre-sponding index in A , or a sequential scan if A does not prescribe anindex for the corresponding relation. Figure 1 shows an example ofthis process for a simple query over three tables T , T , and T , andan atomic configuration that specifies an index on T and anotherindex on T . Each template is also associated with an internal plancost , which is the sum of the costs of the operators in this plan ex-cept the access methods. Given an atomic configuration A , the costof the instantiated plan, denoted as cost ( p , A ) , is the sum of theinternal plan cost and the cost of the instantiated access methods.The intuition is that TPlans ( q ) represents the possibilities for theoptimal plan of q depending on the set of materialized indexes.Hence, given a hypothetical index configuration X , INUM esti-mates cost ( q , X ) as the minimum cost ( p , A ) over p ∈ TPlans ( q ) and A ∈ Atom ( X ) . Note that a slot in p may have restrictions onits sorted order, e.g., the template plan in Figure 1 prescribes thatthe slot for T must be accessed in sorted order of attribute x . If A does not provide a suitable access method that respects this sortedorder, then cost ( p , A ) is set to ¥ . INUM guarantees that there isat least one plan p in TPlans ( q ) such that cost ( p , A ) < ¥ for any A ∈ Atom ( X ) . As shown in the original study [11], INUM providesan accurate approximation for the purpose of index tuning, and isorders-of-magnitude faster compared to conventional what-if opti-mization. Linear composability.
The approximation provided by INUM andC-PQO can be formalized in terms of a property that is termed linear composability in [6].D
EFINITION
INEAR COMPOSABILITY [6]).
Function cost () is linearly composable for a select-statement q if there exists a SELECT T .xFROM T , T , T WHERE T .x = T .y AND T .x = T .z 21(a) Query (b) Template Plan (c) Instantiated Plan based onthe atomic configuration AAccess T Order by x Access T IdxScan(a) RelScan(T )Atomic Configuration A3Access T Order by y IdxScan(b)
Figure 1: Example of template plans and instantiated plans.The configuration A has the following contents: A [ ] = a , anindex with key T . x ; A [ ] = SCAN ; A [ ] = b , an index with key ( T . x , T . w ) [6] set of identifiers K q and constants β p and γ pa for p ∈ K q , a ∈ S ∪ { SCAN } ∪ ··· ∪ { SCAN n } such that:cost ( q , X ) = min { β p + X a ∈ A γ pa , p ∈ K q , A ∈ Atom ( X ) } for any configuration X. Function cost () is linearly composablefor an update-statement q if it is linearly composable for its queryshell. ✷ It has been shown in [6] that both INUM and C-PQO compute acost function that is linearly composable. For INUM, K q = TPlans ( q ) and each p corresponds to a distinct template plan in TPlans ( q ) .Here, we use TPlans ( q ) for the set of identifiers and overload p ∈ TPlans ( q ) to represent an identifier. In turn, the expression β p + P γ pa corresponds to cost ( p , A ) , where β p denotes the internalplan cost of p , and γ pa is the cost of implementing the correspond-ing slot in p using index a . (The slot covers the relation on whichthe index is defined.) Note that linear composability does not im-ply a linear cost model for the query optimizer – non-linearities aresimply hidden inside the constants β qp .For the remainder of the paper, we assume that cost ( q , X ) is com-puted by either INUM or C-PQO (for the purpose of fast what-ifoptimization) and hence respects linear composability. In this subsection, we discuss how to reduce
DDT to a compactBIP for the case when α = C = /0 (i.e., no failures and no con-straints) and the workload comprises solely queries, i.e., W = Q .This reduction forms the basis for generalizing to the full problemstatement, which we discuss later. BIP formulation.
At a high level, we are given an instance of
DDT and we wish to construct a BIP whose solution provides anoptimal divergent design. This reduction will hinge upon the linearcomposability property, i.e., we assume that each query q ∈ W hasbeen preprocessed with INUM and therefore we can approximatecost ( q , X ) for any X ⊆ S as expressed in Definition 1.Figure 2 shows the constructed BIP. (Ignore for now the boxedexpressions.) In what follows, we will explain the different com-ponents of the BIP and also formally state its correctness. The BIPuses two sets of binary variables to encode the choice for a diver-gent design ( I , h ) : • Variable s ra is set to 1 if and only if index a is part of the indexdesign I r on replica r . In other words, I r = { a | s ra = } . • Variable t rq is set to 1 if and only if query q is routed to replica r ,i.e., r ∈ h ( q ) . (Recall that we ignore failures for now.) In otherwords, h ( q ) = { r | t rq = } .5 inimize: ˆ TotalCost ( I , h ) = ˆ QueryCost ( I , h ) + ˆ UpdateCost ( I , h ) ,where: ˆ QueryCost ( I , h ) = X q ∈ Q X r ∈ [ , N ] f ( q ) m ˆ cost ( q , r ) ˆ UpdateCost ( I , h ) = X q ∈ Q upd X r ∈ [ , N ] f ( q ) ˆ cost ( q , r )+ X u ∈ U X r ∈ [ , N ] f ( u ) s ra · ucost ( u , a ) ˆ cost ( q , r ) = X p ∈ TPlans ( q ) β p y rp + X p ∈ TPlans ( q ) a ∈ S ∪{ SCAN1 }∪···∪{
SCAN n } γ pa x rpa , ∀ r ∈ [ , N ] , ∀ q ∈ Q ∪ Q upd (1)such that: X r ∈ [ , N ] t rq = m , ∀ q ∈ Q (2) X r ∈ [ , N ] t rq = N , ∀ q ∈ Q upd (3) X p ∈ TPlans ( q ) y rp = t rq , ∀ q ∈ Q ∪ Q upd (4) s ra ≥ x rpa , ∀ q ∈ Q ∪ Q upd , p ∈ TPlans ( q ) , a ∈ S (5) X a ∈ S i ∪{ SCAN i } x rpa = y rp , ∀ q ∈ Q ∪ Q upd , p ∈ TPlans ( q ) , i ∈ [ , n ] , T i is referenced in q (6) Figure 2: The BIP for Divergent Design Tuning.
Under our assumption of using fast what-if optimization, the costof a query q in some replica r can be expressed as cost ( q , I r ) = cost ( p ′ , A ′ ) for some choice of p ′ ∈ TPlans ( q ) and an atomic con-figuration A ′ ∈ Atom ( I r ) . To encode these two choices, we intro-duce two different sets of binary variables: • Variable x rpa , where p is a template in TPlans ( q ) and a is an indexin S ∪{ SCAN } ∪ ··· ∪ { SCAN n } , is equal to 1 if and only if p = p ′ and a ∈ A ′ . • Variable y rp = p = p ′ .The BIP specifies several constraints that govern the valid valueassignments to the aforementioned variables: • Constraint (2) specifies that query q must be routed to exactly m replicas. • Constraint (4) specifies that there must be exactly one variable y rp set to 1 if t rq =
1, i.e., exactly one template p chosen for com-puting cost ( q , I r ) if q is routed to r . Conversely, y rp = p if t rq = • Constraint (5) specifies that an index a can be used in instantiat-ing a template p at replica r only if it appears in the correspondingdesign I r . • Constraint (6) specifies that if y rp =
1, i.e., p is used to compute cost ( q , I r ) , then there must be exactly one access method a perslot such that x rpa =
1. Essentially, the choices of a for which x rpa = x rpa = a if y rp = cost ( q , I r ) as in Equation 1in Figure 2. The equation is a restatement of linear composability(Definition 1) by translating the minimization to a guarded summa-tion using the binary variables y rp and x rpa . Specifically, if t rq = p suchthat y rp =
1, and constraint (6) forces setting x rpa = p and corresponding to an atomic configuration. Hence,minimizing the expression in Equation 1 corresponds to computing cost ( q , I r ) . Otherwise, if t rq =
0, then the same constraints force cost ( q , I r ) =
0. In turn, it follows that the objective function of theBIP corresponds to
TotalCost ( I , h ) . Handling update statements.
The total cost to execute updatestatements,
UpdateCost ( I , h ) , includes two terms, as shown in thesecond boxed expression in Figure 2. Here, Q upd denotes the setof all the query-shells, each of which corresponds to each updatestatement in U . The first component of UpdateCost () is the totalcost to evaluate every query-shell in Q upd at every replica. Thiscomponent is expressed as the summation of ˆ cost ( q , r ) for all q sel ∈ Q upd and r ∈ [ , N ] in our BIP. Since each query-shell needs to berouted to all replicas, we impose the constraint (3).The second component of UpdateCost () is the total cost to up-date the affected indexes. Using variable s ra that tracks the selectionof an index at replica r in the recommended configuration, the costof updating an index a at replica r given the presence of an updatestatement u is computed as the product of s ra and ucost ( u , a ) . Correctness.
Up to this point, we argued informally about thecorrectness of the BIP. The following theorem formally states thisproperty. The proof is given in Appendix B.T
HEOREM A solution to the BIP in Figure 2 corresponds tothe optimal divergent design for DDT when α = and C = /0 . As stated repeatedly, the key property of the BIP is that it con-tains a relatively small number of variables and constraints, whichmeans that a BIP-solver is likely to find a good solution efficiently.Formally:C
OROLLARY The number of variables and constraints in theBIP shown in Figure 2 is in the order of O ( N | W || S | ) . In fact, it is possible to eliminate some variables and constraintsfrom the BIP while maintaining its correctness. We do not showthis extension since it does not change the order of magnitude forthe variable count but it makes the BIP less readable and harder toexplain.
To extend the BIP to the case when α > t r , jq , y r , jp and x r , jpa , for j ∈ [ , N ] . These variables have the same meaning as their coun-terparts in Figure 2, except that they refer to the case where replica j fails. For instance, t r , jq = q is routed to replica r when j fails, i.e., h j ( q ) = { r | t r , jq = } . We augment the BIP withthe corresponding constraints as well. For instance, we add the con-straint P r = j t r , jq = max { N − , m } , ∀ q ∈ Q , j ∈ [ , N ] to express thefact that function h j () must respect the routing-multiplicity factor m . Finally, we change the objective function to ExpTotalCost () ,which is already linear, and express each term FTotalCost ( I , h , j ) as a summation that involves the new variables.The complete details for this extension, including the proof ofcorrectness, can be found in Appendix C. We should mention thatthis extension increases the number of variables and constraints bya factor of N to O ( N | W || S | ) , since it becomes necessary to reasonabout the failure of every replica j ∈ [ , N ] . In this subsection, we discuss how to extend the BIP when C = /0,i.e., the DBA specifies constraints for the divergent design.Obviously, we can attach to the BIP any type of linear constraint.As it turns out, linear constraints can capture a surprisingly large6lass of practical constraints. In what follows, we present threeexamples of how to translate common constraints to linear expres-sions that be directly added to the BIP. Space budget.
Let size ( a ) denote the estimated size of an index a , and b be the storage budget at each replica. Using the vari-able s ra that tracks the selection of an index at replica r in the rec-ommended configuration, the storage constraint can be encoded as: P a ∈ S s ra size ( a ) ≤ b , ∀ r ∈ [ , N ] . In general, variables s ra can beused to express several types of intra-replica constraints that in-volve the selected indexes, e.g., bound the total number of multi-key indexes per replica, or bound the total update cost for the in-dexes in each replica. Bounding load-skew.
Recall that load ( I , h , j ) captures the totalload of replica j under a divergent design ( I , h ) . The load-skewconstraint specifies that load ( I , h , j ) ≤ ( + τ ) load ( I , h , r ) , for any r = j , where τ is the load-skew factor provided by the DBA.It is straightforward to translate the constraint between two spe-cific replicas j and r into a linear inequality, by using variables x rpa and y rp to rewrite the corresponding load () terms as linear sums.Specifically, load ( I , h , j ) can be expressed as a linear sum similarlyto ˆ TotalCost () in Figure 2, except that we only consider replica j and the queries for which j ∈ h ( q ) , and the same goes for express-ing load ( I , h , r ) .Based on this translation, we can add N ( N − ) constraints to theBIP, one for each possible choice of j and r . We can actually dobetter, by observing that we can sort replicas in ascending order oftheir load, and then impose a single load-skew constraint betweenthe first and last replica. By virtue of the sorted order, the constraintwill be satisfied by any other pair of replicas. Specifically, we addthe following two constraints to the BIP: load ( I , h , i ) ≤ load ( I , h , i + ) , ∀ i ∈ [ , N − ] (7) load ( I , h , N ) ≤ ( + τ ) · load ( I , h , ) (8)This approach requires only N constraints and is thus far more ef-fective.The final step requires adding another set of constraints on ˆ cost ( q , I r ) .This is a subtle technical point that concerns the correctness of thereduction when the constraints are infeasible. More concretely, thesolver may assign variables y rp and x rpa for some query q so thatconstraints (7)–(8) are satisfied even though this assignment doesnot correspond to the optimal cost cost ( q , I r ) . To avoid this situa-tion, we introduce another set of variables that are isomorphic to x rpa and are used to force a cost-optimal selection for y rp and x rpa .The details are given in Appendix D.1, but the upshot is that weneed to add O ( N | W || S | ) additional constraints.We have also developed an approximate scheme to handle load-skew constraints in the BIP. The approximate scheme allows theBIP to be solved considerably faster, but the compromise is thatthe resulting divergent design may not be optimal. However, ourexperimental results (see Section 6) suggest that the loss in qualityis not substantial. The details of the approximate scheme can befound in Appendix D.2 Materialization cost constraint.
This constraint specifies that thetotal cost to materialize ( I , h ) must be below some threshold C m .The materialization cost is computed with respect to the currentdesign ( I c , h c ) and takes into account the cost to scale up or downthe current number of replicas, and the cost to create additionalindexes or drop redundant indexes in each replica.We first consider the case when the number of replicas remainsunchanged between ( I , h ) and ( I c , h c ) . Let us consider a specificreplica r and the new design I r ∈ I . Let I cr ∈ I c denote the previ-ous design. Clearly, we need to create every index in I r − I cr and to delete every index in I cr − I r . Assuming that ccost ( a ) and dcost ( a ) denote the cost to create and drop index a respectively, we can ex-press the reconfiguration cost for replica r as P a I cr s ra ccost ( a ) + P a ∈ I cr ( − s ra ) dcost ( a ) . If each replica can install indexes in paral-lel, then the materialization cost constraint can be expressed as: X a ∈ S ∧ a I cr s ra ccost ( a ) + X a ∈ S ∧ a ∈ I cr ( − s ra ) dcost ( a ) ≤ C m , ∀ r ∈ [ , N ] We can also express a single constraint on the aggregate material-ization cost by summing the per-replica costs.We next consider the case when the DBA wants to shrink thenumber of replicas to be N d < N . In this case, the BIP solver shouldtry to find which replicas to maintain and how to adjust their indexconfigurations so that the total materialization cost remains belowthreshold. For this purpose, we introduce N new binary variables z r with r ∈ [ , N ] associated with each replica r , where z r = r is kept in the new divergent design, and z r = t rq ≤ z r , ∀ q ∈ Q ∪ Q upd , r ∈ [ , N ] (9a) X r ∈ [ , N ] z r = N d (9b)The first constraint ensures that we can route queries only to livereplicas. The second simply restricts the number of live replicas tothe desired number.Lastly, we consider the case when the DBA wants to expand thenumber of replicas to be N d > N . The set of constraints in the BIPcan be re-used except that all the variables are defined according to N d replicas (instead of N replicas as before). The materializationcost can also be computed in a similar way. In addition, we alsotake into account the cost to deploy the database in new replicas,which appear as constants in the total cost to materialize a designin a new replica. Recall that a divergent design ( I , h ) includes both the index-setsfor different replicas and the routing functions h () , h () ,..., h N () .These functions are used at runtime, after the divergent design hasbeen materialized, to route queries to different specialized replicas.A solution to the BIP determines how to compute these functionsfor a training query q in Q , based on the variables t rq and t r , jq . Here,we describe how to compute these functions for any query q ′ thatis not part of the training workload. We focus on the computationof h ( q ′ ) but our techniques readily extend to the other functions.Our first approach is inspired by the original problem statementof the tuning problem [5] and computes h ( q ′ ) as the m replicaswith the lowest evaluation cost for q ′ . Normally this requires N what-if optimizations for q ′ , but we can leverage again fast what-ifoptimization in order to achieve the same result more efficiently.Specifically, we first compute TPlans ( q ′ ) (which requires a fewcalls to the what-if optimizer) and then formulate a BIP that com-putes the top m replicas for q ′ .Our second approach tries to match more closely the revisedproblem statement, where a query is not necessarily routed to itstop m replicas. Our approach is to match q ′ to its most “similar”query q in the training workload Q , and then to set h ( q ′ ) = h ( q ) .The intuition is that the two queries would affect the divergent de-sign similarly if they were both included in the training workload.We can use several ways to assess similarity, but we found that fastwhat-if optimizations provides again a nice solution. Specifically,7 BMS
Workload
DBA
RITA
Online MonitorRecommender
Current Design Monitoring Statistics(Training workload, Constraints)Recommendation
BIP Solver
Tuning BIPMonitoring BIP
Figure 3: The architecture of RITA. we compute again
TPlans ( q ′ ) and then quickly find the optimalplan for q ′ in each replica. We then form a vector v q ′ where the i -thelement is the set of indexes in the optimal plan of q ′ at replica i .We can compute a similar vector for v q and then compute the simi-larity between q ′ and q as the similarity between the correspondingvectors . The intuition is that q ′ is similar to q if in each replica theyuse similar sets of indexes. We can refine this approach further bytaking into account the top-2 plans for each query, but our empiricalresults suggest that the simple approach works quite well. RITA : ARCHITECTURE AND FUNCTION-ALITY
In this section we describe the architecture and the functionalityof RITA, our proposed index-tuning advisor. RITA builds on thereduction presented in the previous section in order to offer a richset of features.Figure 3 shows the architecture of RITA. It comprises two mainmodules: the online monitor , which continuously analyzes theworkload in order to detect changes and opportunities for retun-ing; and the recommender , which is invoked by the DBA in orderto run a tuning session. As we will see later, both modules solvea variant of the
DDT problem in order to perform their function.Also, both modules make use of the reduction we presented in theprevious section in order to solve the respective tuning problems.For this purpose, they employ an off-the-shelf BIP solver. The re-maining sections discuss the two modules in more detail.
The online monitor maintains a divergent design ( I slide , h slide ) thatis continuously re-computed based on the latest queries in the work-load. Concretely, the monitor maintains a sliding window overthe current workload (the length of the window is a parameter de-fined by the DBA) and then solves DDT using the sliding win-dow as the training workload. Each new statement in the runningworkload causes an update of the window and a re-computation of ( I slide , h slide ) .Once computed, the up-to-date design ( I slide , h slide ) is comparedagainst the current design ( I curr , h curr ) of the system, using the ExpTotalCost () metric of each design on the workload in the sliding window. Themodule outputs the difference between the two as the performanceimprovement if ( I slide , h slide ) were materialized. This output, whichis essentially a time series since ( I slide , h slide ) is being continuouslyupdated, can inform the DBA about the need to retune the system. Any vector-similarity metric will do. We first convert v q ′ v q tobinary vectors indicating which indexes are used at each replicaand then use a cosine-similarity metric. Clearly, it is important for the online monitor to maintain ( I slide , h slide ) up-to-date with the latest statements in the workload. For this pur-pose, the online monitor solves a bare-bones variant of DDT thatassumes α = The DBA invokes the recommender module to run a tuning ses-sion, for the purpose of tuning the initial divergent design or re-tuning the current design when the workload changes. The DBAprovides an instance of the
DDT problem, e.g., a training work-load, the parameter α and several constraints, and the recommenderreturns the corresponding (near-)optimal divergent design. The rec-ommender leverages the BIP-based formulation of DDT in order tocompute its output efficiently.If desired by the DBA, the recommender can also return a setof possible designs that represent trade-off points within a multi-dimensional space. For example, suppose that the DBA specifiesthe workload-evaluation cost and the materialization cost of eachdesign as the two dimensions of this space. We expect that a de-sign with a higher materialization cost will have more indexes,and hence will have a lower workload-evaluation cost. The rec-ommender formulates a BIP to compute an optimal divergent de-sign that does not bound the materialization cost. The solutionprovides an upper bound on materialization cost, henceforth de-noted as C m . Subsequently, the recommender formulates severaltuning BIPs where each BIP puts a different threshold on the mate-rialization cost based on C m and some factor (e.g., materializationcost should not exceed . × C m ). The thresholds for these Pareto-optimal designs can be predefined or chosen based on more in-volved strategies such as the Chord algorithm [7]. An importantpoint is that the successive BIPs are essentially identical except forthe modified constraint on the materialization cost, which enablesthe BIP solver to work fast by reusing previous computations.The DBA can also add other parameters into this exploration.For example, adding the number of replicas as another parameterwill cause the recommender to use the same process to generate de-signs for the hypothetical scenarios of expanding/shrinking the setof replicas. The final output can inform the DBA about the trade-offbetween workload-evaluation cost and design-materialization cost,and how it is affected by the number of replicas.Besides being able to perform tuning sessions efficiently, RITA’srecommender module gains two important features through its re-liance on a BIP solver. • Fast refinement.
As mentioned earlier, the BIP solver can reusecomputation if the current BIP is sufficiently similar to previouslysolved BIPs. RITA takes advantage of this feature to offer fastrefinement of the solution for small changes to the input. E.g.,the optimal divergent design can be updated very efficiently ifthe DBA wishes to change the set of candidate indexes or imposeadditional inter-replica constraints. • Early termination.
In the course of solving a BIP, the solvermaintains the currently-best solution along with a bound on itssuboptimality. This information can be leveraged by RITA tosupport early termination based on time or quality. For instance,the DBA may instruct the recommender to return the first solutionthat is within 5% of the optimal, which can reduce substantially8 arameter Values
Number of replicas ( N ) 2, , 4, 5Routing multiplicity ( m ) 1, , 3Space budget ( b ) 0.25 × , × , 1.0 × , INFProb. of failure ( α ) , 0 .
1, 0 .
2, 0 .
3, 0 . τ ) 1 .
3, 1 .
5, 1 .
7, 1 .
9, 2 . INF
Percentage-update ( p upd ) 10 − , 10 − , -3 , 10 − Sliding window ( w ) 40, , 80, 100 Table 1: Experimental parameters (default in bold). the total running time without compromising performance for theoutput divergent design. Or, the DBA may ask for the best solu-tion that can be computed within a specific time interval.
6. EXPERIMENTAL STUDY
This section presents the results of the experimental study thatwe conducted in order to evaluate the effectiveness of RITA. Inwhat follows, we first discuss the experimental methodology andthen present the findings of the experiments.
Advisors.
Our experiments use a prototype implementation ofRITA written in Java. The prototype employs CPLEX v12.3 as theoff-the-shelf BIP solver, and a custom implementation of INUMfor fast what-if optimization. The database system in our exper-iments is the freely available IBM DB2 Express-C. The CPLEXsolver is tuned to return the first solution that is within 5% of theoptimal. In all experiments, we use p RITA to denote the divergentdesign computed by RITA.We compare RITA against the heuristic advisor D
IVG D ESIGN that was introduced in the original study of divergent designs [5].D
IVG D ESIGN employs IBM’s physical design advisor internally.Similar to [5], we run D
IVG D ESIGN five times and output the lowest-cost design out of all the independent runs. We denote this finaldesign as p DD . We note that the comparison against D IVG D ESIGN concerns only a restricted definition of the general tuning problem,since D
IVG D ESIGN supports only a space budget constraint anddoes not take into account replica failures.We also include in the comparison the common practice of us-ing the same index configuration with each replica. The identicalconfiguration is computed by invoking the DB2 index-tuning advi-sor on the whole workload. We use p UNIF to refer to the resultingdesign.
Data Sets and Workloads.
We use a 100GB TPC-DS database [15]for our experiments, along with three different workloads, namely
TPCDS - query , TPCDS - mix and TPCDS - dyn . TPCDS - query com-prises 40 complex TPC-DS benchmark queries that are currentlysupported by our INUM implementation [16]. TPCDS - mix addsINSERT statements that model updates to the base data. TPCDS - dyn models a workload of 600 queries that goes through three phases,each phase corresponding to a specific distribution of the queriesthat appear in TPCDS - query . The first phase corresponds mostly toqueries of low execution cost , then the distribution is inverted forthe second phase, and reverts back to the starting distribution in thefirst phase.In all cases, the weight for each query is set to one, whereasthe update of each INSERT statement is determined as the productof the cardinality of the corresponding relation and a percentage-update parameter ( p upd ). This parameter allows us to simulate dif- The execution cost is measured with respect to the optimal index-set for each query returned by the DB2 advisor. ferent volumes of updates when we test the advisors.
Candidate Index Generation.
Recall from Section 3 that the
DDT problem assumes that a set of candidate indexes S is provided asinput. There are many methods for generating S based on thedatabase and representative workload. In our setting, we use DB2’sservice to select the optimal indexes per query (without any spaceconstraints) and then perform a union of the returned index-sets.The resulting index-set, which is optimal for the workload in theabsence of constraints and update statements, contains 103 candi-date indexes and has a total size of 265GB. Experimental Parameters.
Our experiments vary the followingparameters: the number of replicas N , the per-replica space budget b , the probability of failure α , the load-skew factor τ , the percent-age of updates in the workload p upd (for TPCDS - mix ), and the sizeof the sliding window w for online monitoring. The routing mul-tiplicity factor ( m ) is set to be ⌈ N / ⌉ . We report the additionalexperimental results when varying m in Appendix D.3. Table 1shows the parameter values tested in our experiments. Note thatthe storage space budget is measured as a multiple of the base datasize, i.e., given TPCDS 100 GB base data size, a space budget of0 . × indicates a 50 GB storage space budget. Metrics.
We use
ExpTotalCost () to measure the performance ofa divergent design. To allow meaningful comparisons among thedesigns generated by different advisors, we compute this metric fora specific design by invoking DB2’s what-if optimizer for all therequired cost factors. This methodology, which is consistent withprevious studies on physical design tuning, allows us to gauge theeffectiveness of the divergent design in isolation from any estima-tion errors in the optimizer’s cost models. In some cases, we alsoreport the performance improvement of p RITA over p DD and p UNIF ,where the performance improvement of a design X over a design Y is computed as 1 − ExpTotalCost ( X ) / ExpTotalCost ( Y ) . We alsoreport the time that is taken to execute the index advisor for thecorresponding divergent design. Testing Platform.
All measurements are taken on a machine run-ning 64-bit Ubuntu OS with four-core Intel(R) Core(TM) i7 CPUQ820 @1.73GHz CPU and 4GB RAM.
Basic Tuning Problem.
We first consider a basic case of
DDT when α = τ = + ¥ , i.e., no failures occur and there is no con-straint on load skew. There is a single constraint on the divergentdesign which is the per-replica space budget. This setting corre-sponds essentially to the original problem statement in [5].We begin with a set of experiments that evaluates the perfor-mance of RITA and the competitor advisors on the query-onlyworkload TPCDS - query . In this case indexes can only bring bene-fit to queries, and hence the only restraint in materializing indexescomes from any constraints. Figure 4 shows the performance of thedivergent designs computed by RITA, D IVG D ESIGN , and U
NIF ,as we vary the space budget parameter. (All other parameters areset to their default values according to Table 1.) The results showthat RITA consistently outperforms the other two competitors fora wide range of space budgets. The improvement is up to 75%over p UNIF and up to 67% for p DD , i.e., the performance of p RITA is4 × better than p UNIF and is 3 × better than p DD . Another way toview these results is that RITA can make much more effective us-age of the aggregate disk space for indexes. For instance, p RITA at b = . × matches the performance of p DD at b = . × , i.e., withfour times as much space for indexes. In all cases, RITA’s betterperformance can be attributed to the fact that it searches a consid-erably larger space of possible designs, through the reduction to a9 E x p T o t a l C o s t ( x ) Space budget (b)UNIFDIVGDESIGNRITA E x p T o t a l C o s t ( x ) Number of replicasUNIFDIVGDESIGNRITA Q ue r y C o s t ( x ) Ratio of the update cost of RITA over UNIFUNIFRITA
Figure 4: Varying space budget on
TPCDS - query , α = , τ = + ¥ . Figure 5: Varying number of replicason TPCDS - mix , α = , τ = + ¥ . Figure 6: Constraint the update cost on TPCDS - mix , α = , τ = + ¥ . BIP. As the space budget increases, the performance of p RITA , p DD and p UNIF converge as all beneficial indexes can be materialized inevery design.We next examine the performance of RITA and the competitoradvisors on a workload of queries and updates. Figure 5 reports theperformance of p RITA , p DD and p UNIF for the workload
TPCDS - mix ,as we vary the number of replicas in the system. We chose thisparameter as updates have to be routed to all replicas and henceit controls directly the total cost of updates. We observe that theimprovement of RITA over U NIF is in the order of 50% and theimprovement of RITA over D
IVG D ESIGN is 38%. Not surpris-ingly, the improvements increase with the number of replicas. Thereason is that RITA is able to find designs with much fewer in-dexes per replica compared to p UNIF and p DD , which contributes toa lower update cost. For instance for N = b = . × , thenumber of indexes per replica of p RITA is ( , , ) compared to ( , , ) for p UNIF and ( , , ) for p DD . We conducted sim-ilar experiments with different weights for the update statementsand observed similar trends.The next experiment examines how RITA’s advanced function-ality can control even further the cost of updates. Instead of hav-ing RITA minimize the combined cost of queries and updates, weinstruct the advisor to perform the following constrained optimiza-tion: minimize query cost such that update cost is at most x % of theupdate cost of a uniform design. Essentially, the desire is to makeupdates much faster compared to the uniform design, and also tryto get some benefits for query processing. This changed optimiza-tion requires minimal changes to the underlying BIP: the objectivefunction includes only the cost of evaluating queries, and the con-straints include an additional linear constraint on the total updatecost based on the update cost of the uniform design (which can betreated as a constant). The ease by which we can support this ad-vanced functionality reflects the power of expressing DDT as a BIP.Figure 6 depicts the cost of the query workload under p RITA aswe vary the factor that bounds the update cost relative to p UNIF .For comparison we also show the cost of the query workload for p UNIF . The results show clearly that the designs computed by RITAcan improve performance dramatically even in this scenario. As aconcrete data point, when the bounding factor is set to 0 . p RITA makes query evaluation more than 2 × cheaper compared to p UNIF and incurs an update cost that is less than half the update cost of p UNIF .Overall, our results demonstrate that RITA clearly outperformsits competitors on the basic definition of the divergent-design tun-ing problem. From this point onward, we will evaluate RITA’seffectiveness with respect to the generalized version of the prob-lem (i.e., including failures and a richer set of constraints). In theinterest of space, we present results with query-only workloads, as the trends were very similar when we experimented with mixedworkloads.
Factoring Failures.
We first evaluate how well RITA can tailorthe divergent design in order to account for possible failures, ascaptured by the failure probability α .Figure 13 shows the ExpTotalCost () metric for p RITA , p DD and p UNIF as we vary the probability of failure α . There are two in-teresting take-away points from the results. The first is that p RITA has a relatively stable performance as we vary α . Essentially, wecan reap the benefits of divergent designs even when there is anincreased probability of failure in the system, as long as there is ajudicious specialization for each replica and a controlled strategy toredistribute the workload (two things that RITA clearly achieves).The second interesting point is that the gap between p RITA and p DD increases with α . Basically, p DD ignores the possibility of failures(i.e., it always assumes that α =
0) and hence the computed design p DD cannot handle effectively a redistribution of the workload whena replica becomes unavailable. As a side note, the cost of p UNIF isunchanged for different values of α , since each query has the samecost under p UNIF on all replicas, and hence a redistribution of theworkload does not change the total cost.
Bounding Load Skew.
We next study how RITA handles a (inter-replica) constraint on load skew. Recall that the constraint has thefollowing form: for any two replicas, their load should not differ bya factor of more than 1 + τ , where τ ≥ τ . The ability to satisfy such constraints is part of RITA’s novelfunctionality.Figure 8 shows the performance of p RITA , p DD and p UNIF as wevary parameter τ that bounds the load skew (recall that τ = α = α = . α and τ . Note that we report the results for the greedy version ofRITA, which are identical to the exact solution of the constraint.The chart shows a single point corresponding to p DD , given that itis not possible to constrain load skew within D IVG D ESIGN . Asshown, p DD has a significant load skew of up to a 2 x difference be-tween replicas. This magnitude of skew limits severely the abilityof the system to maintain a balanced load and to route queries effec-tively. In contrast, RITA is able to compute designs that maintain alow expected cost (up to 4 × better than U NIF ) and also satisfy thebound on load skew. These savings are not affected by the value of α –RITA is again able to make a judicious choice for the divergentdesign in order to satisfy all constraints and handle failures. Notethat the uniform design trivially satisfies the load-skew constraint10 E x p T o t a l C o s t ( x ) Probability of failure (a)UNIFDIVGDESIGNRITA 0 5 10 15 20 25 30 35 400.3 0.5 0.7 0.9 1.1 E x p T o t a l C o s t ( x ) Load skew (t)UNIFDIVGDESIGNDIVBIP, a = 0.1DIVBIP, a = 0 0 5 10 15 20 25 30 35 40 0 1 2 3 4 5 6 7 8 9 E x p T o t a l C o s t ( x ) Different run UNIFRITA
Figure 7: Varying probability of failureon
TPCDS - query , α ≥ , τ = + ¥ . Figure 8: Varying load skew on TPCDS - query , α ≥ , τ < + ¥ . Figure 9: Routing queries α = α > τ = + ¥ τ < + ¥ α = α > τ = + ¥
20 120 τ < + ¥
30 146(a) Workload
TPCDS - query (b) Workload TPCDS - dyn Table 2: The average running time of RITA (in seconds) for all values of τ as every replica has the same design and hencethe system can be perfectly balanced. Running Time.
Given an instance of the basic
DDT problem ( α = τ = + ¥ ), RITA spends 180 seconds to initialize INUM, a stepthat is dependent solely on the input workload, and then requiresonly four seconds to formulate and solve the resulting BIP. An im-portant point is that the initialization step can be reused for free ifthe workload remains unchanged, e.g., if the DBA runs several tun-ing sessions using the same workload but different constraints eachtime. Each subsequent tuning session can thus be executed in theorder of a few seconds, offering an almost interactive response tothe DBA.Table 2(a) shows the running time for RITA on TPCDS - query workload as we vary the load-skew factor and the probability offailure, two parameters that correspond to novel features of ourgeneralized tuning problem. Note that the time to initialize INUMremains the same as before and is excluded from all the cells of thetable. Clearly, the new features complicate the tuning problem andhence have an impact on running time. Still, even for the most com-plex combination ( τ > α >
0) RITA has a reasonable runningtime of at most 84 seconds. Moreover, as noted in Section 5, RITAcan always be invoked with a time threshold and return the bestdesign that has been identified within the allotted time.Table 2(b) shows the same details about the running time ofRITA on
TPCDS - dyn workload, consisting of 600 queries. RITAalso runs efficiently for this large workload. Routing.
The next set of experiments examines the effectivenessof the routing scheme we introduced in Section 4.5, which deter-mines how to route unseen queries (i.e., queries not in W for whichthe routing functions h j cannot be applied) to “good” specializedreplicas.Our test methodology splits TPCDS - query into two (sub)workloads:(1) a training workload that plays the role of W and consists of 30randomly-chosen queries of TPCDS - query , and (2) a testing work-load that plays the role of the unseen queries and consists of theremaining 10 queries. We compute a divergent design p RITA for thetraining workload, and route the queries in
TPCDS - query (includ-ing both seen and unseen queries) assuming p RITA is deployed. Forcomparison, we apply the same methodology to the uniform de- sign: we first derive p UNIF for the training workload and then routethe queries in
TPCDS - query workload in round-robin fashion. Werepeat this experiment for ten independent runs, where each runinvolves a different random split of the workload.Figure 9 shows the expected cost of the workload for p RITA and p UNIF for each run. The results show that RITA outperforms U
NIF consistently, even though replica specialization does not take intoaccount the unseen queries. The improvements vary across differ-ent runs depending on the choice of the workload split, but overallwe can reap the benefits of divergent designs even with incompleteknowledge of the workload.
Online Monitoring.
The aforementioned routing scheme can helpthe system cope with unseen queries, but at some point it may be-come necessary to retune the divergent design if the actual work-load is substantially different than the training workload. The nextexperiment evaluates the online-monitoring module inside RITAwhich is designed for the task of detecting workload changes.We assume that the system receives the dynamic workload
TPCDS - dyn ,which shifts to a different query distribution after query 200 andthen shifts back to the original distribution at query 400. Initially,the system is equipped with a divergent design p currRITA that is tunedwith a training workload from the first query distribution. Themonitoring module continuously computes a divergent design p slideRITA based on a sliding window of the last 60 queries in the workload,and outputs the improvement on ExpTotalCost () if p slideRITA were usedinstead of p currRITA .Figure 10 shows the monitoring statistics produced by the online-monitoring module of RITA for the TPCDS - dyn workload. Match-ing our intuition, the output shows that p slideRITA has small improve-ments for the first 200 queries (around 30%), since the current de-sign p currRITA is already tuned for the particular phase of the workload.However, as soon as the workload shifts to a different distribution,the output shows a considerable improvement of more than 60%.This can be viewed as a strong indication that a retuning of thesystem can yield significant performance improvements. The spiketapers off close to query 450, since in this experiment the workloadshifts back to its previous distribution and hence there is no benefitto changing the current design.RITA requires 1 . Elastic Retuning.
After observing the monitoring output, the DBAcan invoke the recommender module to examine different recom-mendations for retuning the system in an elastic fashion. The next11 I m p r o v e m en t ( % ) Query
80 120 160 200 E x p T o t a l C o s t ( x ) Materialization cost (x 10 )N = 2N = 3N = 4 CURRENT
80 120 160 200 E x p T o t a l C o s t ( x ) Materialization cost m = 3m = 2m = 1
CURRENT
Figure 10: Online monitoring Figure 11: Elasticity retuning, varyingnumber of replicas and materializationcosts Figure 12: Elasticity retuning, varyingrouting multiplicity factor and materi-alization costs set of experiments evaluate how fast RITA can generate these rec-ommendations and also their quality.We employ a scenario that builds on the previous experiment ononline monitoring. Specifically, we assume that the DBA invokesthe recommender using the sliding window of 60 queries that corre-sponds to the spike in Figure 10. Moreover, the DBA specifies twodimensions of interest with respect to a new divergent design: theworkload-evaluation cost and the cost of materializing the design.Also, the DBA wants to study the effect of shrinking and expandingthe number of replicas. We assume that the DBA sets the probabil-ity of failure ( α ) to be 0 in order to allow RITA to execute fast andgenerate the output in a timely fashion. After inspecting the output,the DBA may invoke another (more expensive) tuning session fora specific choice of replicas (or routing multiplicity factor) and re-configuration cost, and a non-zero α . Our results in Figure 13 showthat RITA can compute a divergent design that matches the samelevel of performance as the case for α = ( x , y ) on the chart corresponds to a di-vergent design that requires x cost units to materialize and whose ExpTotalCost () is equal to y . The three curves labeled N = z , z ∈ { , , } , represent divergent designs that employ z replicas.We assume that N = N = N =
4) represents dropping (resp. adding) areplica. The chart also shows the
ExpTotalCost () metric of the cur-rent design, for comparison. As shown, there are several options tosignificantly improve (by up to 7 × ) the performance of the currentdesign. Moreover, the DBA obtains the following valuable infor-mation: there is a least materialization cost in order to get someimprovement; designs that require more than 160 units of materi-alization cost offer diminishing returns for N = N =
4; andthere is not much benefit to increasing the number of replicas, since N = N = m = z , z ∈ { , , } ,represent divergent designs that have the routing multiplicity fac-tor z (We assume that m = m = m =
1. This result indicatesthat we can obtain designs with some flexibility in routing queries(i.e., m =
2) and without sacrifying much in terms of performanceas designs that have the most specialization (i.e., m =
7. CONCLUSION
In this paper, we introduced RITA, a novel index tuning advi-sor for replicated databases, that provides DBAs with a powerfultool for divergent index tuning. The key technical contribution ofRITA is a reduction of the problem to a compact binary integer pro-gram, which enables the efficient computation of a (near-)optimaldivergent design using mature, off-the-shelf software for linear op-timization. Our experimental studies demonstrate that, compared tostate-of-the-art solutions, RITA offers richer tuning functionalityand is able to compute divergent designs that result in significantlybetter performance.
8. REFERENCES [1] Amazon relational database service (amazon rds),http://aws.amazon.com/rds.[2] P. Bernstein, I. Cseri, N. Dani, N. Ellis, A. Kalhan, G. Kakivaya,D. Lomet, R. Manne, L. Novik, and T. Talius. Adapting MicrosoftSQL server for cloud computing. In
ICDE , pages 1255 –1263, 2011.[3] N. Bruno and R. V. Nehme. Configuration-parametric queryoptimization for physical design tuning. In
SIGMOD , pages941–952, 2008.[4] S. Chaudhuri and V. R. Narasayya. AutoAdmin ’What-if’ IndexAnalysis Utility. In
SIGMOD , pages 367–378, 1998.[5] M. P. Consens, K. Ioannidou, J. LeFevre, and N. Polyzotis. Divergentphysical design tuning for replicated databases.
SIGMOD , 2012.[6] D. Dash, N. Polyzotis, and A. Ailamaki. Cophy: A scalable, portable,and interactive index advisor for large workloads.
PVLDB ,4(6):362–372, 2011.[7] C. Daskalakis, I. Diakonikolas, and M. Yannakakis. How good is thechord algorithm? In
SODA , 2010.[8] J. Dittrich, J.-A. Quiané-Ruiz, S. Richter, S. Schuh, A. Jindal, andJ. Schad. Only aggressive elephants are fast elephants.
Proc. VLDBEndow. , 5(11):1591–1602, 2012.[9] A. Jindal, J.-A. Quiané-Ruiz, and J. Dittrich. Trojan data layouts:right shoes for a running elephant. In
SOCC , pages 1–14, 2011.[10] H. Kimura, G. Huo, A. Rasin, S. Madden, and S. B. Zdonik. Coradd:Correlation aware database designer for materialized views and ndexes. PVLDB , 3(1):1103–1113, 2010.[11] S. Papadomanolakis, D. Dash, and A. Ailamaki. Efficient use of thequery optimizer for automated database design. In
VLDB , pages1093–1104, 2007.[12] R. Ramamurthy, D. J. DeWitt, and Q. Su. A case for fracturedmirrors.
The VLDB Journal , 12(2):89–101, 2003.[13] K. Schnaitter, N. Polyzotis, and L. Getoor. Index interactions inphysical design tuning: Modeling, analysis, and applications.
PVLDB , 2(1):1234–1245, 2009.[14] J. A. Solworth and C. U. Orji. Distorted mirrors. In
IEEE PDIS ,pages 10–17, 1991.[15] Transaction Peformance Council. TPC-DS Benchmark.[16] R. Wang, Q. T. Tran, I. Jimenez, and N. Polyzotis. INUM+: A leaner,more accurate and more efficient fast what-if optimizer. In
SMDB ,2013.[17] E. Wu and S. Madden. Partitioning techniques for fine-grainedindexing. In
ICDE , pages 1127–1138, 2011.[18] D. C. Zilio, J. Rao, S. Lightstone, G. Lohman, A. Storm,C. Garcia-Arellano, and S. Fadden. DB2 Design Advisor: IntegratedAutomatic Physical Database Design. In
VLDB , pages 1087–1097,2004.
APPENDIXA. PROVING THEOREM 1
We reduce the original problem studied in [5] to
DDT by provingtheir equivalence when α = C contains solely a space-budgetconstraint per replica. Since the original problem is NP-Hard, thesame follows for DDT . The result in Lemma 1 (See below) is thekey to prove their equivalence. It is important to note from Sec-tion 3.3 that in the general setting of
DDT , h ( q ) might not corre-spond to the m replicas with the least evaluation cost for q .L EMMA In the problem setting of DDT when α = and Ccontains solely a space-budget constraint per replica, h ( q ) corre-sponds to the m replicas with the least evaluation cost for q. We prove Lemma 1 using contradiction. Assume that for somequery q , there exist two replicas r and r such that r ∈ h ( q ) , r h ( q ) and cost ( q , I r ) > cost ( q , I r ) . We then derive another routingfunction h ′ that is similar to h except that h ′ is slightly modified asfollows: h ′ ( q ) = h ( q ) ∪ { r } − { r } . Clearly, TotalCost ( I , h ) > TotalCost ( I , h ′ ) . This contradicts to the requirement to minimize TotalCost ( I , h ) in the problem setting of DDT . B. PROVING THEOREM 2
We prove the theorem in two steps. First, we show that every di-vergent design ( I , h ) corresponds to a value-assignment v for vari-ables in the BIP such that v satisfies the constraints (Lemma 2).This property guarantees that the solution space of the BIP con-tains all possible solutions for the divergent design tuning problem.Subsequently, we prove that the optimal assignment v ∗ correspondsto a divergent design. Combining these two results, we can thenconclude the correctness of the theorem (Lemma 3).To simplify the presentation and without loss of generality, weprove the theorem for the basic DDT when α = C = /0 and theworkload comprises solely queries, i.e., W = Q .Given a valid-assignment v , we use BIPcost ( v ) to denote thevalue of the objective function of the BIP under the assignment v .L EMMA For any divergent physical design ( I , h ) , there is anassignment v s.t. TotalCost ( I , h ) = BIPcost ( v ) . L EMMA Let v ∗ denote the solution to the BIP problem. Then,TotalCost ( I , h ) = BIPcost ( v ∗ ) , where ( I , h ) is the divergent designderived from v ∗ . B.1 Proof of Lemma 2
Given a divergent design ( I , h ) and for every query q ∈ Q , usingthe linear decomposability property, we can express the cost of q atreplica r ∈ h ( q ) as: cost ( q , I r ) = β p + X i ∈ [ , n ] , a = Y [ i ] γ pa for some choice of p = p r ∈ TPlans ( q ) and Y = Y p , r ∈ Atom ( I r ) .We assign the values for variables as follows. • v ( t rq ) = r ∈ h ( q ) , • v ( y rp ) = p = p r , r ∈ h ( q ) , • v ( x rpa ) = p = p r , r ∈ h ( q ) and a = Y p , r [ i ] , i ∈ [ , n ] , • v ( s ra ) = a ∈ I r , r ∈ [ , N ] , and • The other cases of variables are assigned value 0We observe that under this assignment, all constraints in the BIPare satisfied. For instance, since v ( t rq ) = r ∈ h ( q ) and h ( q ) m values, it can be immediately derived that P r ∈ [ , N ] t rq = m ,i.e., constraint (2) is satisfied.By eliminating terms with value 0, we obtain the following re-sults. BIPCost ( v ) = X q ∈ Q X r ∈ h ( q ) f ( q ) m ˆ cost ( q , r ) ˆ cost ( q , r ) = β p + X i ∈ [ , n ] , a = Y [ i ] γ pa , for r ∈ h ( q ) , p = p r , Y = Y p , r ∈ Atom ( I r ) Thus,
BIPCost ( v ) = TotalCost ( I , h ) . B.2 Proof of Lemma 3
The following arguments are derived based on the assumptionthat v ∗ satisfies the BIP formulation.First, based on (2), we derive that for every query q , there exists aset S q = { r | r ∈ [ , N ] } and | S q | = m such that v ∗ ( t rq ) = r ∈ S q .Second, based on (4), we derive that for every query q and every r ∈ S q , there exists exactly one plan p = p r ∈ TPlans ( q ) such that v ∗ ( y rp ) = Y p , r , r ∈ S q , p = p r that corresponds to the assignments for v ( x rpa ) .Finally, we prove that p r and Y p , r , r ∈ S q , correspond to thechoice of plan p and atomic configuration Y that yields the mini-mum value of cost ( q , I r ) , by using contradiction. Combining theseresults, we conclude that BIPCost ( v ∗ ) = TotalCost ( I , h ) .Suppose that there exists a different choice p c ∈ TPlans ( q ) and Y c ∈ Atom ( I r ) , r ∈ S q , such that cost ( q , p c , Y c ) < cost ( q , p r , Y p , r ).Here, we use cost ( q , p , Y ) denote the cost of q using the templateplan p and the atomic configuration Y .We can now derive an alternative assignment v c that is similar to v ∗ except the followings: • Variables corresponding to p r and Y p , r are assigned value 0, and • v ( y rp ) = p = p c , r ∈ S q , and • v ( x rpa ) =
1, if p = p c , r ∈ S q and a = Y c [ i ] , i ∈ [ , n ] .We observe that v c is a valid constraint-assignment for the for-mulated BIP. However, since BIPcost ( v c ) < BIPcost ( v ∗ ) , this con-tradicts our assumption about the optimality of v ∗ . C. FACTORING FAILURES
In this section, we present the full details of how RITA integratesfailures into the BIP.Under our assumption of using fast what-if optimization, the costof a query q in some replica r can be expressed as cost ( q , I r ) = cost ( p ′ , A ′ ) for some choice of p ′ ∈ TPlans ( q ) and an atomic con-figuration A ′ ∈ Atom ( I r ) We introduce the following additional vari-ables. • t r , jq = q is routed to replica r when j fails, i.e., h j ( q ) = { r | t r , jq = }• x r , jpa = q is routed to replica r when j fails, p = p ′ and a ∈ A ′ . • y r , jp = q is routed to replica r when j fails, p = p ′ .We also need to add a new set of constraints, as given in Fig-ure 13. These constraints are very similar to their counterparts inFigure 2. The correctness of the BIP is proven in the same way aspresented in Appendix B. D. BOUNDING LOAD-SKEW
FTotalCost ( I , h , j ) = X q ∈ Q X r ∈ [ , N ] ∧ r = j f ( q ) max m , N − cost ( q , r , j ) ˆ cost ( q , r , j ) = X p ∈ TPlans ( q ) β p y r , jp + X p ∈ TPlans ( q ) a ∈ S ∪{ SCAN1 }∪···∪{
SCAN n } γ pa x r , jpa , ∀ r ∈ [ , N ] , ∀ q ∈ Q ∪ Q upd (10)such that: X r ∈ [ , N ] t r , jq = max { N − , m } , ∀ q ∈ Q (11) X p ∈ TPlans ( q ) y r , jp = t r , jq , ∀ q ∈ Q ∪ Q upd (12) s ra ≥ x r , jpa , ∀ q ∈ Q ∪ Q upd , p ∈ TPlans ( q ) , a ∈ S (13) X a ∈ S i ∪{ SCAN i } x r , jpa = y r , jp , ∀ q ∈ Q ∪ Q upd , p ∈ TPlans ( q ) , i ∈ [ , n ] , T i is referenced in q (14) Figure 13: Augmented BIP to handle failures. cost opt ( q , r ) = X p ∈ TPlans ( q ) β p yo rp + X p ∈ TPlans ( q ) a ∈ S ∪{ SCAN1 }∪···∪{
SCAN n } γ pa xo rpa , ∀ q ∈ Q ∪ Q upd ∀ r ∈ [ , N ] (15) X p ∈ TPlans ( q ) yo rp = X a ∈ S i ∪{ SCAN i } xo rpa = yo rp , ∀ p ∈ TPlans ( q ) ∀ i ∈ [ , n ] ∧ T i is referenced in q (16b) cost opt ( q , r ) ≤ β p + X i ∈ [ , n ] a ∈ S ∪{ SCAN1 }∪···∪{
SCAN n } γ pa u rpa , ∀ p ∈ TPlans ( q ) (17) X a ∈ S i ∪ I /0 u rpa = , ∀ t ∈ [ , K q ] , ∀ i ∈ [ , n ] ∧ T i is referenced in q (18a) u rpa ≤ s ra , ∀ p ∈ TPlans ( q ) ∧ a ∈ S (18b) X b ∈ S i ∪ I /0 ∧ γ pa ≥ γ pb u rpb ≥ s ra , ∀ p ∈ TPlans ( q ) , i ∈ [ , n ] , a ∈ S i (19) Figure 14: Query-Optimal Constraints
D.1 Additional Constraints for Exact Solution
This section presents the set of constraints that RITA formulatesin order to ensure the optimality of ˆ cost ( q , r ) with the presence ofbounding load-skew constraints.RITA introduces a new cost formula cost opt ( q , r ) = cost ( q , I r ) for r ∈ [ , N ] . The formula of cost opt ( q , r ) is very similar to cost ( q , r ) ;the variables yo rp (resp. xo rpa ) have the same meaning with y rp (resp. x rpa ). The main difference is that for r h ( q ) , we haveˆ cost ( q , r ) = cost opt ( q , r ) = cost ( q , I r ) >
0. The atomicconstraint in (16) are somehow similar to the atomic constraints on cost ( q , r ) . Note that in (16a), the constraint requires exactly onetemplate plan to be chosen to compute cost opt ( q , r ) in order for thisvalue corresponds to the query execution cost of q on replica r .To establish the optimal cost constraints, we use the following al-ternative way to compute cost ( q , X ) . For each internal plan cost β p , p ∈ TPlans ( q ) , we first derive a “local” optimal cost, referred to as C localt , which is the smallest cost that can be obtained by “plugging”all possible atomic configurations A ∈ Atom ( X ) into the slot of the14emplate plan of β p . Essentially, C localt = β p + I localp , where I localp is the smallest value of the total access cost using some atomic-configuration A ∈ Atom ( X ) to plug into the template plan of β p . Toobtain I localp , for each slot in the internal plan of β p , we enumerateall possible indexes in X that can be “plugged” into, and find theone that yields the smallest access cost to sum up into I localp . Lastly, cost q ( X ) is then obtained as the smallest value among the derived C localp with p ∈ TPlans ( q ) .The right hand-side of (17) is the formula of C localp . Here, weintroduce variables u rpa ; where u rpa = a is used at slot i in the template plan β p to compute C localp . For C localp to correspondto some atomic configuration, we impose the constraint in (18a).Furthermore, an index a can be used in C optp if and only if a isrecommended at replica r (constraint (18b)).The constraint (19) ensures that the candidate index with thesmallest access cost is selected to plug into each slot of β t in com-puting I localt . D.2 Greedy Approach
This section presents our proposal of a greedy scheme that trade-offs the quality of the design for the efficiency.First, we derive an optimal design ( I opt , h opt ) assuming there isno load imbalance constraint and the probability of failure is 0. Wethen compute an approximation factor β = τ − +( N − ) τ . and add thefollowing constraint into the BIP. load ( I , h , r ) ≤ ( + β ) TotalCost ( I opt , h opt ) N , ∀ r ∈ [ , N ] (20)This constraint is an easy constraint, as its right handside is aconstant. We prove that if the BIP solver can find a solution forthe modified BIP, the returned solution is a valid solution and has TotalCost ( I , h ) bounded as the following theorem shows.T HEOREM The divergent design returned by the greedy so-lution satisfies all constraints in DDT problem and hasTotalCost ( I , h ) ≤ ( + β ) TotalCost ( I opt , h opt ) . ✷ P ROOF . We overload I opt (resp. I ) to refer to the total cost ofthe design I opt (resp. I ) as well.The maximum load of a replica in I is ( + β ) I opt N (due to the con-straint 20). By summing up the load of all replicas in I , we obtain: I ≤ ( + β ) I opt . Therefore, I differs from I opt by an approximationratio ( + β ) . All remaining issue is to prove that I satisfies theload-imbalance constraint.Without loss of generality, assume that load ( , I ) ≤ load ( j , I ) , ∀ j ∈ [ , N ] .Since I is load-imbalance, we can derive the followings: I = X j ∈ [ , N ] load ( j , I ) + load ( , I ) (21a) ( + β ) I opt N ≥ load ( j , I ) (21b) ( N − ) N ( + β ) I opt + load ( , I ) ≥ I (21c) I ≥ I opt (21d) load ( , I ) ≥ (cid:18) − ( N − ) N ( + α ) (cid:19) I opt (21e)The maximum load in I is N ( + α ) I opt and the minimum loadis (cid:16) − ( N − ) N ( + α ) (cid:17) I opt . Therefore, the load-imbalance factor E x p T o t a l C o s t ( x ) Routing multiplicity (m)UNIFDIVGDESIGNRITA
Figure 15: Varying the routing multiplicity factor on
TPCDS - query workload of I is + β − ( N − ) α . By replacing the value of β , we obtain the load-imbalance factor τ .Note that this greedy scheme does not encounter the aforemen-tioned problem with ˆ cost ( q , r ) not to be equal to cost ( q , I r ) . Infor-mally, the reason is due to the fact that the right hand-side of theinequality constraint in (20) is a constant. D.3 Additional Experimental Results
This section presents the comparison between RITA, D
IVG D ESIGN and U
NIF when we vary the routing multiplicity factor. Figure 15presents one representative result when we vary this factor on
TPCDS - query workload with b = . × and N = m increases, the total cost of p RITA and p DD increase, since queries need to be sent to more places. Notethat the cost of p UNIF remains the same, as all replicas have the sameindex configuration under UNIF design. Also, when m = N , thetotal costs of p DD and p UNIF are the same, since D
IVG D ESIGN needsto send every query to every replica, and it uses the same black-boxdesign advisor as U
NIF to compute the recommended index-set ateach replica.We observe that in all cases, RITA significantly outperformsD
IVG D ESIGN and U