[PDF] Updatable Materialization of Approximate Constraints

Abstract

Modern big data applications integrate data from various sources. As a result, these datasets may not satisfy perfect constraints, leading to sparse schema information and non-optimal query performance. The existing approach of PatchIndexes enable the definition of approximate constraints and improve query performance by exploiting the materialized constraint information. As real world data warehouse workloads are often not limited to read-only queries, we enhance the PatchIndex structure towards an update-conscious design in this paper. Therefore, we present a sharded bitmap as the underlying data structure which offers efficient update operations, and describe approaches to maintain approximate constraints under updates, avoiding index recomputations and full table scans. In our evaluation, we prove that PatchIndexes significantly impact query performance while achieving lightweight update support.

Full PDF

UUpdatable Materialization of Approximate Constraints

Steffen Kläbe [email protected] Ilmenau, Germany

Kai-Uwe Sattler [email protected] Ilmenau, Germany

Stephan Baumann [email protected] Germany GmbH

ABSTRACT

Modern big data applications integrate data from various sources.As a result, these datasets may not satisfy perfect constraints, lead-ing to sparse schema information and non-optimal query perfor-mance. The existing approach of PatchIndexes enable the definitionof approximate constraints and improve query performance by ex-ploiting the materialized constraint information. As real world datawarehouse workloads are often not limited to read-only queries,we enhance the PatchIndex structure towards an update-consciousdesign in this paper. Therefore, we present a sharded bitmap as theunderlying data structure which offers efficient update operations,and describe approaches to maintain approximate constraints underupdates, avoiding index recomputations and full table scans. In ourevaluation, we prove that PatchIndexes significantly impact queryperformance while achieving lightweight update support.

In classical database theory, database schemas are well-defined andcontain a set of database constraints, like primary and foreign keys,uniqueness definitions or sort keys, which offer two major advan-tages. First, defining constraints ensures data integrity, which ishandled automatically by the database management system (DBMS)whenever updates on the data occur. Second, constraint definitionsare used in query optimization to accelerate queries containingjoins, aggregations or sort operations. Here, query optimizers canbenefit from the constraint information by choosing optimal physi-cal operators or optimal query plans.Real world cloud applications often do not follow this classi-cal theory of a well-defined database schema and contain sparseschema information. This is reinforced by the lack of a databaseadministrator who tunes the database continuously. In general, bigdata applications often contain unclean data for different reasonslike integrating different sources with heterogeneous schemas orreal world anomalies like duplicate names, shared addresses or tele-phone numbers. Thus, perfect constraints may not exist in thesedatasets. Nevertheless, they may contain approximate constraints,which are constraints that hold for all tuples except a small set ofexceptions (captured as “exception rate”). These exceptions prohibitthe definition of a constraint, resulting in the loss of valuable infor-mation. Examples for approximate constraints are “nearly uniquecolumns” (NUC) and “nearly sorted columns” (NSC), whose defini-tion we introduced in [18].

Permission to make digital or hard copies of part or all of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full citationon the first page. Copyrights for third-party components of this work must be honored.For all other uses, contact the owner/author(s). , , © 2021 Copyright held by the owner/author(s).

Amount of tuples that match the constraint [%] c o l u m n s USCensus_1 (NSC)IGlocations2_1 (NUC)IUBlibrary_1 (NUC)

Figure 1: Histogram over approximate constraint columnsin PublicBI datasets

Investigations on real world datasets prove the existence of ap-proximate constraints and encourage the need to handle them bythe DBMS. The PublicBI benchmark [28] is a collection of Tableauworkbooks and allows experimental evaluation against real userdatasets with their common properties, e.g. many string columns,many NULL values or the absence of constraint definitions. Some ofthese datasets showed a large number of approximate constraints,like the

USCensus_1 , the

IGlocations2_1 and the

IUBlibrary_1 work-book. The histogram shown in Figure 1 represents the distribu-tion of approximate constraint columns for these datasets. The

USCensus_1 workbook contains over 500 columns, from which 15columns match an approximate sorting constraint. Nine columnsmatch the sorting constraint with over 60% of their tuples. The

IGlocations2_1 and the

IUBlibrary_1 workbooks contain a smallnumber of columns, from which a relatively large amount followan approximate uniqueness constraint. Many of these columns arenearly perfectly unique.In [18] we proposed the concept of PatchIndexes, which allowsthe definition of approximate constraints and enables databasesystems to benefit from the constraint information in query execu-tion by materializing exceptions and handling them separately. Weprovided constraint discovery mechanisms for NUC and NSC andcompared different design approaches. The evaluation showed thepositive impact of the concept on query performance even for highexception rates. Real world OLAP applications often differ fromthe theoretical assumption of a read-only data warehouse withperiodical bulk updates, as data freshness becomes more importantin decision-making processes and HTAP workloads. Consequently,update support is of significant performance in these applications.As the main issue with the materialization of information is theupdate support, we further enhance the PatchIndex feature in thispaper in the direction of a generic and updatable materializationapproach for approximate constraints. We explicitly exclude thediscovery of these approximate constraints, as already discussedin [18], and focus on an update-conscious index design and querysupport. Our main contributions of this paper are: a r X i v : . [ c s . D B ] F e b , Steffen Kläbe, Kai-Uwe Sattler, and Stephan Baumann • We design a sharded bitmap data structure as the underlyingconcept of the PatchIndex, offering efficient insert, modifyand delete operations. • As delete operations are the major challenge for the under-lying sharded bitmap data structure, we present a paralleland vectorized approach for delete support. • We provide lightweight mechanisms to discover deltas ofthe materialized PatchIndex information in case of updatequeries. Consequently, constraints are allowed to becomeapproximate over time, even if they were perfect at the timeof definition. This reduces the abort frequency of updateoperations compared to perfect constraints.The remainder of the paper is organized as follows: We discuss re-lated work in Section 2 and recap the background of PatchIndexesin Section 3. In Section 4, we present the sharded bitmap struc-ture as an update-conscious index design, before describing updatehandling mechanisms in Section 5. We evaluate our approach inSection 6 before concluding in Section 7.

The materialization of information is a widely investigated topicin the database research field, particularly in the context of mate-rialized views. While significantly impacting query performance[7], the major drawback of materialized views is their update sup-port. For the subset of select-project-join (SPJ) views there areefficient ways to support table updates and keeping views consis-tent [5, 6, 14, 26]. They typically try to avoid base table access bydetermining update sets and performing differential update oper-ations. Besides that, materialized information should be chosenbased on the expected benefit, so intermediate results of frequentqueries are a good candidate for materialization. A benefit-based ap-proach for choosing intermediate results is given in [23], while [15]also includes the probability of a materialized result to be usable infuture queries to the cost model. Nevertheless, these approachesdo not take updatability into account.Besides materialized views, there are several specialized mate-rialization approaches like SortKeys, materializing a certain tupleorder, or JoinIndexes [27] and Bitwise Dimensional Co-Clustering(BDCC) [4] for join materialization. While JoinIndexes materializeforeign key joins by maintaining an index to the join partner as anadditional table column, BDCC physically co-locates join partnersof different tables in distributed databases. Our approach materi-alizes aggregate, join and sort results directly at the table level,which makes it usable for a wide range of queries while still pro-viding efficient update support. Particularly updating uniquenessconstraints, which are the results of aggregations, differs from thegeneral approaches for SPJ views.Research on approximate constraints evolved from the field ofconstraint discovery. For the uniqueness constraint, approaches forunique column combinations (UCCs) [2] and respective discoveryalgorithms [16, 24] are introduced to handle perfect constraints.In order to also handle approximate uniqueness constraints, theconcepts of “possible” and “certain” keys are presented in [19]to enforce constraints by replacing violating tuples. Besides that,embedded uniqueness constraints (eUC) [30] separate uniquenessfrom completeness to enforce the uniqueness constraint only on a subset of tuples. Recent publications [21, 25] also cover discoveryapproaches for approximate denial constraints, a more general classof data-specific constraints. As these approaches mainly focus onthe discovery of approximate constraints, we integrate approximateconstraints into query execution in an updatable way with ourpaper. We therefore combine the concept of approximate constraintswith the concept of materialization.We base our approach on the concept of patch processing, whichis commonly used in compression. The PFOR, PFOR-DELTA andPDICT compression schemes [31] show robustness against outliersby handling exceptions to certain distributions separately. Further-more, white-box compression [13] learns distributions and proper-ties of data in order to choose appropriate compressing schemes.Here, tuples that do not follow a certain behaviour can be com-pressed using another compression scheme than remaining tuples,significantly improving compression rates as a result. These ap-proaches modify the way data is physically stored to handle ex-ceptions, which significantly differs from our approach. By notchanging the physical order of data, our concept of PatchIndexesallows multiple approximate constraint definitions per table, so forexample the definition of multiple approximate SortKeys.

In [18] we introduced the concept of PatchIndexes, enabling data-base systems to define and materialize approximate constraints.In this section, we briefly recap the basic problem definition, theapproaches for the PatchIndex design and its integration into queryoptimization.

An approximate constraint is a constraint that is satisfied by alltuples except a set of exceptions. We identify these exceptions usingtheir rowIDs and refer to a tuple violating a certain constraintas a patch . Hence, the set of patches 𝑃 𝑐 is the set of all rowIDsviolating a certain constraint in a column 𝑐 . The ratio between thenumber of exceptions and the total number of tuples is defined asthe exception rate 𝑒 . As examples, we introduced “nearly uniquecolumns” (NUC) and “nearly sorted columns” (NSC) and provideddiscovery mechanisms determining a minimal set of patches to therespective constraints. The main task of the PatchIndex structure is the efficient represen-tation of the set of patches for a given constraint. We investigatedtwo basic approaches for the design. While the identifier-basedapproach maintains a list of the 64 bit tuple identifiers of patchesand is therefore the sparse way of storing, the bitmap-based ap-proach is the dense way of storing and contains a single bit foreach tuple of the indexed column, indicating whether it is an ex-ception or not. The choice between both approaches is a trade-offin terms of memory consumption, with the bitmap-based approachconsuming less memory for cases with 𝑒 ≥ = . . Datapartitioning is transparent for PatchIndexes, as a separate index iscreated for each partition. Constraint discovery, index creation andquery processing are performed partition-locally and in parallel. pdatable Materialization of Approximate Constraints , , The main idea to integrate PatchIndexes into query execution is thePatchIndex scan, merging the PatchIndex information on-the-flywith the dataflow and splitting it into a flow of tuples satisfying theconstraint and a flow of exceptions, and work on both dataflowsseparately. The PatchIndex scan is realized using an additional se-lection operator on top of a scan operator, merging the PatchIndexinformation on-the-fly to the dataflow according to selection modes exclude_patches and use_patches . The key for query optimization us-ing PatchIndexes is cloning query subtrees for data and patches andoptimizing both subtrees separately. Typically, we can exploit thePatchIndex information in the dataflow that excluded the patchesand achieve a speedup in this subtree, as a constraint is fulfilled here.Finally, both subtrees are combined again to ensure transparencyfor the remaining query execution tree.PatchIndexes can be integrated into the optimization of distinct,join and sort queries. For distinct queries, we can exploit the infor-mation that tuples are unique in a NUC after excluding the patches,so the most expensive aggregation operator to compute the distinctvalues can be dropped from this subtree as shown on the left side ofFigure 2. Here “X” is an arbitrary subtree that does not contain anyjoins or aggregations. In case of a query that contains a grouping,this approach can also be applied by decoupling the aggregationfor grouping from the distinct aggregation.

Figure 2: Query plans for distinct (left) and join (right)queries before (black operators) and after PatchIndex opti-mization with the newly inserted operators highlighted inblue.

Knowing about the sorting of NSC in join queries, we can replacethe generic HashJoin operator with the faster MergeJoin operatorin the subtree that excluded patches, like shown in the right part ofFigure 2. This optimization requires the subtree “X” to be sorted onthe join key, which is a frequent case in joins between fact tables anddimension tables in data warehouse applications. While subtree “Y”must preserve the tuple order and is therefore not allowed to containaggregations, both “X” and “Y” may contain join operators that areorder preserving, so e.g. being the probe side of a HashJoin. Underthese requirements the PatchIndex information can be propagatedthrough a query tree. In order to further optimize this approach,the result of subtree “X” is buffered instead of computed twice.Additionally, join sides can be swapped to further accelerate theHashJoin. As the number of patches is known during optimizationtime, we can choose the build side of the HashJoin as the one withthe lowest cardinality to improve the time and space for hash tablebuilding. Building the hash table on the patches is often the bestdecision as the number of patches is typically small. The join sidesof the MergeJoin are chosen similarly to ensure the combinability of both subtrees. As the MergeJoin operator is not sensitive to joinsides, this has no impact on performance.Besides the join operator, the information about a NSC can alsobe used to accelerate sort queries, as we already know that a majorpart of the data is already sorted. Here we follow an approachsimilar to distinct queries, so the query plan is equal to the left planof Figure 2 exchanging the aggregation with the sort operator. Ifthere is a PatchIndex defined on the sort column using the sameordering as the sort operator, the sort operator becomes obsoletein the subtree that excludes patches, as these tuples are alreadyknown to be sorted. The sort operation only needs to be performedon the patches, which is intended to be the minor part of the data.In order to preserve the sort order, the results of both subtrees arecombined using a Merge operator instead of a Union operator. Ourevaluation in [18] showed that PatchIndexes significantly increasequery performance for different queries and even high exceptionrates, resulting in the choice for the bitmap-based design approachdue to it’s constant memory consumption.

PatchIndexes are currently designed as main memory data struc-tures. In order to keep the database log as slim as possible, weavoid logging the actual PatchIndex information, so PatchIndexesare recreated after a system shutdown or failure. Alternatively, thePatchIndex information can be persisted to disk as a checkpointin combination with the logging of succeeding update operationsafter persisting the index state.

The use cases for query optimization presented in the preceed-ing subsections can be easily integrated into the cost models ofarbitrary query optimizers, as cardinalities and operator outputestimates are known during optimization time and we use ordinaryquery operators for the optimization, except the newly introducedselection modes for the PatchIndex scan. The costs for query trees(build and execution costs) produced by the PatchIndex optimiza-tions can therefore be estimated by query optimizers and if theestimated costs are smaller than the costs of the subtree before theoptimization, the PatchIndex should be used for the query. The sele-tion operators with modes exclude_patches and use_patches mergethe PatchIndex information on-the-fly to the output dataflow ofthe scan operator. This is particularly independent from the datatypes of the input data, as the decision of passing or dropping atuple is based on a tuple’s rowID. As a consequence, the operator’soverhead is fixed for every tuple. In our experiments, the selec-tion operators for both selection modes took a minor part of queryruntimes (typically below 1%).

In database systems, there are three types of table updates, namelyinsert, modify and delete operations. With the bitmap-based ap-proach described in Section 3.2, inserts and modifies can be handledefficiently. For inserts, reallocating/resizing the bitmap and settingsingle bits is sufficient, while for modifies single bits need to bechanged. Therefore we mainly focus on delete operations, which arethe main challenge for the bitmap structure, as it needs to be ensured , Steffen Kläbe, Kai-Uwe Sattler, and Stephan Baumann that read access to specific bits remains efficient after update oper-ations. Although it can be realized by shifting the bitmap towardsthe deleted position, this potentially shifts large amounts of mem-ory and therefore results in poor performance. In this section, wepresent our approach of a sharded bitmap that efficiently supportsdelete operations while keeping additional memory consumptionlow. We base our design approach on existing update-consciousbitmap approaches and extend them with parallel and vectorizeddelete and bulk delete support.

Update Conscious Bitmap Indices [11] faced the problem of de-grading read performance under updates [3]. HICAMP [29] andUpBit [3] proposed different solutions to this problem for multi-dimensional bitmap indexes. While HICAMP divides bitmaps intoslices and maintains access to them over a directed, acyclic graph,UpBit uses delta-structures for update maintenance and “fencepointers” for fast access to arbitrary bit positions. Both approachesshare the concept of slicing, either explicitly or implicitly overpointers, to keep updates local and allow parallel operations on thebitmap.Similar to UpBit, in our sharded bitmap design we implicitlydivide bitmaps into virtual shards in order to keep delete operationslocal and to enable parallelism in update operations. The upper partof Figure 3 shows the design concept of the data structure. Thebitmap is realized using an array of addressable elements. Whilethese elements are 64 bit types in our implementation, we reducedthe size to 8 bits in Figure 3 for clarification. A virtual shard thenconsists of multiple addressable elements and a single additionalinteger value indicating the index of the first bit in a shard, similarto UpBit’s fence pointers. Besides keeping delete operations local,this approach also facilitates fine-grained locking and logging forefficient concurrency control as described in Section 5.4.

Figure 3: Sharded bitmap design before (A) and after (B)deleting the bit at position 5.

In an ordinary bitmap, single bit access methods set() , get() and unset() are performed in two steps. First, the positionof the bit is calculated by getting the addressable element usinga division of the position by the size of an addressable element(realized using a bit shift) and getting the position within an elementusing a bitwise AND operation with a bitmask. In a second step,the bit is changed or returned using another bitwise operation.In order to access a single bit in the sharded bitmap structure, wefirst compute the shard containing the bit. This can be efficientlyrealized using a bit shift and additional comparisons with the startvalues of the upcoming shards. The additional checks are necessaryas a bit may be contained in a subsequent shard due to previous / / Load data to axv v e c t o r __m256i x = _mm256_loadu_si256 ( ( __m256i * ) data ) ; / / Get l a s t b i t of each element __m256i y = _mm256_and_si256 ( x , bit_mask ) ; / / F i r s t b i t w i l l be l a s t b i t of prev . element y = _mm256_sllv_epi64 ( x , shift_mask63 ) ; / / Save element at pos 0 f o r next i t e r a t i o n b i t s = _mm256_blend_epi32 ( b i t s , y , 0 x03 ) ; / / R o t a t i o n Mask : ( 3 , 3 , 2 , 1 ) i s 1111001 __m256i r o t a t e d =_mm256_permute4x64_epi64 ( y , 0 xF9 ) ; / / Copy element from l a s t i t e r a t i o n to pos 3 r o t a t e d = _mm256_blend_epi32 ( rotated , b i t s , 0xC0 ) ; / / Saved element to pos 3 f o r next i t e r a t i o n b i t s = _mm256_permute4x64_epi64 ( b i t s , 0 x24 ) ; / / S h i f t data x = _mm256_srlv_epi64 ( x , shift_mask1 ) ; / / I n s e r t b i t from next element x = _mm256_or_si256 ( x , r o t a t e d ) ; / / S t o r e back to data _mm256_storeu_si256 ( ( __m256i * ) data , x ) ; Listing 1: Cross-element bit shift using AVX2 delete operations. Then the bit can be accessed using the positionrelative to the start value of the shard in a similar way than theordinary bitmap. This way, accessing a single bit in the shardedbitmap is only slightly slower than in ordinary bitmaps.

Deleting a single bit in the sharded bitmap is dividedinto three steps:(a) Determine the position of the bit like described in Section 4.2.1.(b) Shift all subsequent bits within the shard by one position to-wards the deleted bit.(c) Decrement the start values of all subsequent shards.As an example, the result of deleting the bit at position 5 isshown in the lower part of Figure 3 with the changed elementshighlighted in red color. Note that after deleting a bit, the valuesof subsequent elements change, which is the desired semantic ofthe delete operation. In the example, the bit at position 25 after thedelete operation is the bit at position 26 before the delete. Comparedto ordinary bitmaps, steps (b) and (c) are tradeoffs, as we limit theimpact of a delete operation to a single shard in step (b), but needadditional effort afterwards in step (c) to adjust metadata. Therefore,the choice of the shard size is crucial for the data structure.Step (b) is the main challenge of this approach as it involvescross-element bit shifts, which can be realized by a sequence of bitmasking and shifting operations. In order to further accelerate thisstep, we designed a vectorized cross-element shifting algorithmthat uses Advanced Vector Extensions Version 2 (AVX2) intrinsicsand is based on shifting, masking and permutation intrinsics toenable data exchange between AVX lanes. The algorithm that isused inside the loop over the data of a shard is shown in Listing 1(assuming pre-defined constant bit masks for shifting and masking).

In order to reduce the effort of changing the startvalue array and to exploit the opportinities of the sharding approachfor parallelism, we introduce a bulk delete operation to the shardedbitmap structure. The basic concept of the bulk delete operationis shown in Figure 4. After a preprocessing step to determine theshards that belong to the elements, step (b) of shifting within thesingle shards can be performed in parallel using threads, as bitshiftsremain local to the shards due to the design of the data structure.A thread is hereby created for each shard that contains indexes to pdatable Materialization of Approximate Constraints , , be deleted, so the total number of threads depends on the numberof shards and the location of deleted positions. The actual bitshiftis again realized using the vectorized algorithm shown in Listing 1.At the end of the operation, all start values are adapted in a singlearray traversal by holding a running sum over deleted bits of allpreceeding groups. This way, the vectorized and parallelized bulkdelete operation minimizes the additional effort of step (c) whilefurther accelerating step (b).

Figure 4: Parallel & vectorized bulk delete operation

The bulk delete operation is order sensitive, as deleting a bitwithin a shard shifts subsequent bits, changing their position. Con-sequently, positions of bits that are intended to be deleted are notcorrect if bits at smaller positions were deleted before. To overcomethis, delete operations are performed in descending order, startingfrom the largest position. In our implementation, the PatchIndexstructure buffers the rowIDs of tuples that should be deleted andperforms a bulk delete operation at the end of the query. It therebyensures the order of the rowIDs to enable an efficient bulk delete.

With each single delete operation, a bit at the endof a shard gets lost, as the subsequent bit from the following shardis not shifted to this position. This is the main drawback of thesharded bitmap design and follows from limiting the impact of adelete operation to a single shard. In order to overcome this draw-back, we introduce a condense operation to the sharded bitmapdesign, which shifts the elements of the bitmap between shardsand resets the utilization of the structure as a result. The condenseoperation is realized using a single traversal over the bitmap. Foreach shard, data of subsequent shards is shifted to the bits that werelost due to delete operations and the start values are adapted ac-cordingly. Condensing can be triggered manually or automaticallyby monitoring the utilization of the bitmap and triggering once acertain threshold is reached.

Handling table update operations like inserts, deletes or modifies isthe most significant problem of materialization. As table updateswere not present at the time a materialization was computed, itreaches an inconsistent state whenever an update occurs. The com-mon way to handle this inconsistency is refreshing, often leadingto an expensive recomputation. For some use cases, minor inconsis-tencies might be acceptable and refresh cycles can be chosen quiteloose, while other applications rely on consistent and up-to-datedata and therefore demand for very tight or just-in-time refreshcycles.On the contrary, PatchIndexes are designed to efficiently supporttable update operations. We avoid getting inconsistent states byhandling updates immediately after they occur. In order to perform these updates efficiently, the design of the update handling mech-anisms is driven by the goal of avoiding an index recomputationand avoiding a full table scan while preserving the invariant ofholding all exceptions to a given constraint. The basic ideas forhandling inserts, deletes and modifies for the specific constraintsare summarized in Table 1.For the design of the update handling mechanisms we utilizedthe following technologies that are commonly used in analytical,column-store based DBMS: • Delta structures:

Delta structures are used to maintaintable updates in-memory instead of writing them to disk,which is a typical approach for read-optimized column-stores.We used the concept of Postitional Delta Trees [17], whichwas already integrated in our test system. PDTs are not cou-pled with but queried by PatchIndexes. • Summary tables:

Summary tables are widely used in ex-isting DBMS to collect certain information about columns.As an example, we used small materialized aggregates [22],also called Minmax indexes, which materialize minimum andmaximum values for buckets of tuples, enabling data pruningduring scans by the evaluation of e.g. selection predicates. • Range propagation:

Range propagation (or selection prop-agation in [4]) extends the use of minmax indexes to pushscan ranges across join operators. While static range propa-gation enables pushing scan ranges across joins during querybuild phase, dynamic range propagation dynamically gen-erates scan ranges during query execution, e.g. during thebuild phase of HashJoins. • Intermediate result caching:

Caching can be used withinqueries to avoid expensive recomputations. In our furtherconsiderations, we use the Reuse operator to encapsulatethis behaviour. Intermediate results are materialized in mainmemory by the ReuseCache operator and read again fromthe ReuseLoad operator.

Handling insert operations in the PatchIndex translates to answer-ing the question which tuples have to be added to the existingpatches. For the uniqueness constraint, this is a difficult question,as this constraint relies on a global view of the table. Inserting asingle tuple might produce a collision with another tuple that wasalready in the table and had a unique value in the indexed columnbefore. As we need to keep track of all occurences of non-uniquevalues to ensure correctness, we perform a join query after theinsert operation, joining the inserted tuples with the actual table(including inserted values, as duplicates might also occur in theinserts) like shown in Figure 5. Afterwards we project rowIDs ofboth join sides using intermediate result caching. The rowIDs arethen merged into the existing patches. This way, we preserve thecapability of holding the minimum set of patches to make the tableunique when excluding them, while also avoiding a recomputationof the full index. In order to avoid the full table scan we utilize dy-namic range propagation. After the build phase of the join operatoris finished, scanning the full table is reduced to only the blocks thatcontain potential join partners. This significantly reduces I/O over-head and is therefore a major improvement of the insert handlingmechanism. , Steffen Kläbe, Kai-Uwe Sattler, and Stephan Baumann

Nearly unique column Nearly sorted columnInsert

Scan inserted tuples, join them with the table, mergethe results with the existing patches. Determine a new sorted subsequence extending thealready existing one.

Delete

Drop tracking information about deleted tuples. Drop tracking information about deleted tuples.

Modify

Scan modified tuples, join them with the table, mergethe results with the existing patches. Merge all modified tuples with the existing patches.

Table 1: Design principles for update operationsFigure 5: Insert handling query on table T with dynamicrange propagation (DRP) and intermediate result caching

For the sorting constraint, we focus on a local view on the in-serted tuples instead of a global view on the whole table by extend-ing the already existing sorted subsequence with inserted valuesinstead of recomputing a globally longest sorted subsequence. ThePatchIndex keeps track of the last value of the subsequence (i.e. thelargest value of an ascending sort order or the smallest value of adescending sort order). During execution of the insert query, thePatchIndex temporaryly stores inserted values and starts computinga longest sorted subsequence with all values being larger/smallerthan the last value of the existing sorted sequence. We hereby uti-lize the same longest sorted subsequence algorithm [12] as used inthe discovery of a NSC in [18]. The rowIDs of tuples that are notincluded in this extending sorted subsequence are added to the setof patches. With this mechanism, we probably lose the optimality ofthe index, namely to keep track of the longest sorted subsequence.The reason for that is the fact that combining two sorted sequencesthat have a maximum length for two parts of the data might notform a longest sorted sequence for the combination of the twoparts. An example for this can be easily constructed, e.g. the tableholding values (1, 2, 10) and inserted values (3, 4). Nevertheless thisdoes not lead to wrong query results when applying the PatchIndexinformation like described in Section 3.3 and should be negligiblefor the typical use cases. However, monitoring the exception rateand triggering a global recomputation once a certain threshold isreached is a possible solution for that.From an implementation point of view, scanning the insertedvalues is realized by scanning the PDTs of the current query. Fur-thermore, merging the determined results with the existing patchestranslates to merging the two lists for the identifier-based approachor reallocating the bitmap and setting the respective bits for thenew patches in the bitmap-based approach.

For the uniqueness constraint, handling modify operations is donesimilarly to insert handling described in Section 5.1. For the sortingconstraint, all modified tuples have to be added to the set of patches,as modifying values might destroy the sorting of the computedsubsequence. The only difference in the actual realization is that reallocating the bitmap for the bitmap-based approach becomesobsolete, as the table cardinality does not change during modifyoperations.

Deletions are handled by the PatchIndex by dropping the infor-mation about deleted tuples from the patch information withouttaking a global picture of the table into account. For both unique-ness and sorting constraint, dropping values from the table doesnot violate the constraint. For the uniqueness constraint, we mightlose the optimality this way, as a value that was not unique beforethe delete operation might become unique afterwards if the otheroccurrences of the value was deleted. Nevertheless, it would remainin the patches to keep the delete handling mechanism simple andwould obviously not lead to wrong query results when applying thePatchIndex to queries. The same holds for the sorting constraint,probably losing optimality when deleting values from the existinglongest subsequence. As described in Section 5.1, monitoring andrecomputing the index is a possible solution if this becomes a prob-lem. From an implementation point of view, deleting values raisesthe problem that rowIDs of subsequent tuples decrease for eachdeleted tuple. This is considered by the bulk delete operation of thesharded bitmap data structure as described in Section 4.2.3. For theidentifier-based approach we keep track of the number of deletedtuples with smaller rowIDs and decrement each identifier whilegoing through the list of patches.

In terms of concurrency control, PatchIndexes seamlessly integrateinto a system’s snapshot isolation mechanism [10]. Although snap-shot isolation is a very coarse-grained way to achieve serializabilityof transactions, it is often sufficient for read-optimized DBMS. Nev-ertheless, PatchIndexes offer opportunities for a more fine-grainedconcurrency control due to the underlying sharded bitmap datastructure. As shards are independent from each other, fine-grainedlocking can be used to avoid concurrent access without lockingthe whole data structure. Adapting the start values of a shardedbitmap produces no conflicts, as the only operation that adapts thestart values is the delete operation which uses decrement opera-tions. Concurrent decrements are no conflicts, as different ordersof executing a series of decrements produce the same result.

In the preceeding discussions we presented the PatchIndex ap-proach for NUC and NSC. However, PatchIndexes are not limitedto these constraints due to the generic design of the data structure.Different constraints could be easily integrated by (1) implement-ing the PatchIndex interface for constraint-specific initial filling,insert, modify and delete support and (2) adding an optimizer rule pdatable Materialization of Approximate Constraints , , to automatically rewrite query plans using the existing PatchIndexscan in order to exploit the approximate constraint information.We leave the investigation on different approximate constraintsopen for future work.

In this section, we evaluate our solution in different experimentsto prove the performance impact of PatchIndexes and its updatabil-ity. We therefore show experiments on different integration levels,starting from the lowest level of the underlying data structure. Here,we present a comparison between the sharded bitmap approachand ordinary bitmaps and determine the optimal shard size. Second,we present a set of PatchIndex microbenchmarks for a fine-grainedevaluation of aspects like performance impact, creation time orupdate operations. Afterwards we show the impact of PatchIndexeson query performance using a query subset of the well-knownTPC-H benchmark. For the evaluation, we integrated PatchIndexesinto the Actian Vector 6.0 commercial DBMS, which is built on theX100/Vectorwise [9] analytical database engine. The system runson a machine consisting of two Intel(R) Xeon(R) CPU E5-2680 v3with 2.50 GHz, offering 12 physical cores each, 256 GB DDR4 RAMand 12 TB SSD. For all measured results, we used queries on hotdata, which means that data resides in the in-memory buffers of thesystem. This way, we reduce the I/O impact and focus on the purequery execution time. Additionally, we did not trigger any shardedbitmap condense operations but started every single experimentwith a freshly build index structure for comparability reasons.In the evaluation, we compare the generic PatchIndex approachagainst different specialized materialization approaches, namelymaterialized views, SortKeys and JoinIndexes. For distinct queries,we used materialized views as a comparison, which is a widelyused technique in database systems to pre-compute partial querieslike the distinct query in our example. While leading to a signif-icant performance benefit if matched by a user query, the majordrawback of materialized views is their ability to handle updates.Typically, they need to be re-computed when updates occur to keepthem consistent with the actual database. Alternatively, these ex-pensive recomputations can be delayed and run regularly, if minorinconsistencies are acceptable for the user application. As mate-rialized views are not offered by Actian Vector, we simulate thisapproach by storing the materialized information in a separate tableand manually rewrtiting queries. For sort queries, we comparedPatchIndexes against SortKeys, which physically sorts data on thegiven SortKey column. This way, sort queries can be translatedto simple scan queries. As a drawback, physically reordering datais a very costly operation and maintaining this order in case ofupdates requires additional effort. Last, we evaluated join queriesby comparing the PatchIndex approach against JoinIndexes, whichmaterialize foreign key joins as an additional table column. If aSortKey is defined on the table holding the primary key of thejoin, the foreign key related table is ordered similarly, so that aMergeJoin becomes possible to join both tables.

Choosing the shard size is crucial for the performance and the mem-ory overhead of the sharded bitmap data structure. A small shard Shard size in bits (log) B u l k d e l e t e r u n t i m e [ s ] M e m o r y o v e r h e a d [ % ] Parallel Parallel & vect. Mem. overhead

Figure 6: Sharded bitmap bulk delete runtime for 1M el-ements and memory overhead of sharding depending onshard size Bitmap Sharded bitmapSequential Set . 𝑛𝑠 𝑛 s Sequential Get . 𝑛𝑠 . 𝑛𝑠 Seq. Delete . · 𝑛𝑠 𝑛𝑠 Seq. Bulk Delete − 𝑛𝑠 Table 2: Bitmap operator runtimes per element for bitmapwith 100M elements and shard size bits size leads to a large memory overhead due to the additional start val-ues and deletion performance overhead for their adaption. On thecontrary, a large shard size leads to a negligible memory overhead,but results in shifting large amounts of memory, which was theproblem we intended to avoid with the design of the sharded bitmap.The runtime of bulk deleting 1M elements (randomly chosen butfixed) from a sharded bitmap of size 100M is shown in Figure 6 forthe parallel and parallel & vectorized bulk delete implementation.First, we can locate a clear minimum runtime at a shard size of bits. Below this minimum, the overhead of preprocessing andthread starting is not worth the benefit of multithreading and abovethis minimum the shifting effort starts to dominate the runtimeagain. Second, we can observe that vectorization impacts perfor-mance only in a minor way at these relatively small shard sizes, butgets more impactful the larger the shard size is chosen. Regardingmemory consumption, a 64 Bit start value is stored for each shard,resulting in a memory overhead of 𝑠ℎ𝑎𝑟𝑑 _ 𝑠𝑖𝑧𝑒 · . Choosingthe shard size of bits therefore leads to a memory overhead of0.39% for sharding.Table 2 shows the latency per element of the bitmap operatorsthat are relevant for the PatchIndex. Comparing an ordinary bitmapwith the sharded bitmap approach, the virtual sharding leads toa small overhead for bit access operations, which is caused bydetermining the group an element belongs to. In terms of deleteoperations, the sharded bitmap performs three orders of magnitudefaster than the ordinary bitmap, which improves to another orderof magnitude when using the bulk delete operation to delete 1Melements. For the ordinary bitmap structure the delete runtime issize dependent and linearly increases/decreases when changingthe bitmap size. Consequently, the sharded bitmap data structuresignificantly outperforms the ordinary bitmap in terms of deletionsupport, which particularly holds for large bitmap sizes, and justifies , Steffen Kläbe, Kai-Uwe Sattler, and Stephan Baumann the small bit access overhead and small memory overhead for usecases where update support is important. For our microbenchmarks, we designed a data generator [1] thatvaries the exception rates to given constraints. The data consistsof 1B tuples with two columns, a unique key column and a value column that shows the desired data distribution. With the tuplewidth of 128 Bytes this results in a dataset size of 128 GB. In orderto exploit parallel data processing, we partition the datasets onthe key column into 24 partitions. As the key column is unique,this results in partitions of nearly equal size. For the uniquenessconstraint, exceptions of the value column are equally distributedinto 100K values, while the remaining values are unique and differfrom the values of the exceptions. For the sorting constraint, excep-tions are randomly chosen and all remaining values form a sortedsequence in ascending order. For both constraints, exceptions arerandomly placed in the datasets. As the datasets are generated once,the randomness does not impact the comparability of the evalua-tion results. For the evaluation of update operations, we chose thedataset with exception rate 𝑒 = . . This choice has no impact onupdate query performance, as bits of updated patches have to beaccessed independently from being patches before the update.The materialization is realized in different ways for the con-straints in the microbenchmarks. For the uniqueness constraint,we materialized a table containing all unique values of the value column using a distinct query. As a result, a distinct query on the value column is replaced by a scan query on the materialized viewwithout the need for an expensive aggregation. For the sorting con-straint, we materialized the order information using a SortKey onthe value column. This way, the data of the real table is physicallyre-ordered according to the value column. Queries that include asort operator therefore just scan the table. As the table is parti-tioned, an additional merge step of the tuples from each partitionis necessary to preserve the global order of the query results. With this experiment, we want to provethe query performance improvement for different exception rateswhen using PatchIndexes. We run a distinct query and a sort queryrespectively while varying the exception rates for the uniquenessand sorting constraint in the dataset.Figure 7 shows the results of this experiment. For the uniquenessconstraint, reference runtimes without any constraint definitionincrease with increasing exception rates, before decreasing startingfrom an exception rate of 0.3. With increasing exception rates, thenumber of distinct tuples and therefore the number of aggrega-tion groups decrease. The runtime behaviour is then caused by theinference of a reduced hash table size and the increased commu-nication costs, as the system uses a shared hash table build planfor the aggregation. The use of a materialized view shows a nearlyconstant runtime as the query only scans the materialized result.Using a PatchIndex shows a significant performance benefit com-pared to the reference runtimes with performance comparable tothe materialized view. Both PatchIndex design approaches performsimilarly and show a slight increase in runtime with increasingexception rates, caused by more tuples being processed in the ag-gregation. As a result, the actual performance gain compared to

PI_bitmap PI_identifier Mat. view (NUC)

General 𝑡 / · . 𝐵 𝑒 · 𝑡 · 𝐵 ( + ( − 𝑒 ) · 𝑡 ) · 𝐵𝑒 = .

01 125 . MB MB . GB 𝑒 = . . MB . GB . GB Table 3: Memory consumption for example dataset of128 GB / 𝑡 = tuples the reference runtime shrinks with increasing exception rates, butusing a PatchIndex does not impact runtimes in a negative way forthe evaluated cases.For the sorting constraint, reference runtimes increase with in-creasing exception rates, which is caused by the pivoting strategyof the internal QuickSort implementation, behaving better the moresorted the input sequence already is. Using a SortKey as material-ization shows a constant runtime, although slightly slower thanthe scan query for the uniqueness constraint, as merging the sortedpartitions is necessary. Additionally, the query still performs a sortoperator to ensure the sorting, resulting in a slightly worse perfor-mance than using a PatchIndex for small exception rates. Using thePatchIndex shows a significant performance gain compared to thereference runtimes. Runtimes increase as expected with increasingexception rates as more tuples have to be processed by the sortoperator, so the relative performance gain shrinks with increasingexception rate. Again, using a PatchIndex does not impact runtimesin a negative way for the evaluated cases. The memory consumption of a PatchIn-dex is independent from the materialized constraint and shownin Table 3. While the bitmap-based approach has a constant totalmemory consumption (1 bit per tuple + sharded bitmap overhead),the memory consumption of the identifier-based approach growslinearly with the number of exceptions. Hence, the bitmap-based ap-proach has a lower memory consumption for cases with exceptionrate 𝑒 > . . In contrast, the materialized view for the unique-ness constraint materializes every unique value (100K unique values+ exceptions), leading to a significantly higher memory consump-tion than the PatchIndex for most cases. Another important fact for the usability ofdata structures is their creation effort. Both PatchIndex and materi-alization are intended to be used multiple times after being created.Otherwise the creation effort would not be worth compared to theachieved query speedups. Figure 8 shows the comparison of the run-times for creating the PatchIndex and the materialization for bothconstraints. For the uniqueness constraint, the runtimes follow thereference runtimes of Figure 7, as the distinct query is pre-computedand materialized. The effort to create a PatchIndex is slightly higherthan the materialization, as the information about exceptions haveto be filled into the index structure. Furthermore, the bitmap-basedapproach performs better than the identifier-based approach, ashere only bits have to be set in an already allocated bitmap insteadof maintaining a growing list of identifiers.For the sorting constraint, creating the SortKey takes a hugeamount of time, as this physically reorders the table data. In com-parison, creating a PatchIndex is more efficient. The increasingexception rate leads to an increasing number of comparisons inthe longest sorted subsequence algorithm, while decreasing thelength of the sorted sequence and therefore decreasing the effort pdatable Materialization of Approximate Constraints , , . . . . Exception rate R u n t i m e [ s ] . . . . Exception ratew/o constraint Materialization PI_bitmap PI_identifier

NUC NSCFigure 7: Runtimes of a distinct/sort query with varying exception rate . . . . Exception rate R u n t i m e [ s ] . . . . Exception rate

NUC NSCFigure 8: Runtime for materialization/index creation for varying exception rate N U C R u n t i m e [ s ] N S C R u n t i m e [ s ] Update granularity

INSERT MODIFY DELETEFigure 9: Update performance for inserting/updating/deleting 1000 tuples and varying update granularities to reconstruct it. The inference of both parts leads to the observedruntimes. Next to the worse creation performance of a SortKey, it’sdefinition is also limited to one per table, as it physically reordersthe data. In comparison, PatchIndexes can be defined multiple timesper table on different columns, as it does not change the way datais physically stored. Again, the bitmap-based approach performsbetter than the identifier-based approach.

Based on the dataset with an exception rate 𝑒 = . ,we inserted 1000 tuples per run into the database and varied thegranularity of the insert operations between 5 tuples per operation(200 queries in total) and 1000 tuples per operation (1 query in total) to capture the impact of trickle and bulk inserts. The leftcolumn of Figure 9 shows the total runtimes to insert the 1000tuples. First, we can observe that recomputing or maintaining thematerialization for each insert operation produces a tremendousoverhead to the reference runtime without any constraint defini-tion, making it especially not usable for trickle updates. In contrast,PatchIndexes support insert handling in a more efficient way. Forboth constraints, the identifier-based design approach performsworse than the bitmap-based approach, caused by keeping the listof identifiers sorted. For the uniqueness constraint, every insert , Steffen Kläbe, Kai-Uwe Sattler, and Stephan Baumann Q3 Q7 Q12 Insert Delete R u n t i m e [ s ] w/o constraint PI_10% PI_5% PI_0% PI_0%_ZBP JoinIndex Figure 10: TPC-H query performance operation invokes the insert handling query shown in Figure 5, re-sulting in an overhead for fine-grained inserts. Similarly, the sortingconstraint invokes the execution of the longest sorted subsequencealgorithm on the inserted tuples for every insert query. Neverthe-less, the added overhead is smaller compared to the uniquenessconstraint. For insert granularities of 50 tuples per operation orhigher, the overhead is negligible for both constraints and defininga PatchIndex on a column does not impact insert runtimes in a re-markable way. Comparing the plot for the bitmap-based approachand for the materialization, this result also leads to another conse-quence. Keeping the runtime nearly equal by fixing a value on they-axis, update cycles can be chosen 50 times more frequently for theuniqueness constraint and 100 times more frequently for the sort-ing constraint when using a PatchIndex instead a materialization.This increases the ability of the system to keep the materializedinformation consistent with the actual dataset.

Similar to the insert experiment, we here updated1000 tuples of the dataset with exception rate 𝑒 = . for differ-ent granularities. For the uniqueness constraint, we can observea similar behaviour than for the insert support, as both performthe same query to handle the updates and keep the PatchIndexaccurate. For the sorting constraint, updates must be included tothe exceptions in the PatchIndex, as they may destroy the sortedsubsequence . This can be done efficiently without the need for anadditional query, leading to nearly no overhead compared to thereference runtime. The right column of Figure 9 shows the total run-time for deleting 1000 tuples from the dataset with exception rate 𝑒 = . with varying granularities. As the PatchIndex just dropsinformation about deleted tuples, handling deletes is a very efficientoperation and adds nearly no overhead to the operation runtime.Furthermore, we can observe that the identifier-based approachperforms worse than the bitmap-based approach, caused by thefact that identifiers have to be decreased when a tuple with a loweridentifier is deleted, while handling deletes in the bitmap-basedapproach is realized using efficient bulk delete operations of thesharded bitmap. The TPC-H benchmark [8] is a well-known and well-understoodbenchmark used for performance evaluation of analytical DBMSfor many years. Besides analytical queries, the benchmark alsocontains update sets for table inserts and table deletes that can be used to evaluate update support. In our experiments, we used thebenchmark at scale factor SF 1000.We decided to focus on the largest join in the benchmark, whichis the join between the lineitem and orders table. Although thebenchmark only contains clean data with perfect constraints, wemanually manipulated the data order of the lineitem table in or-der to introduce exceptions to the sorting constraint, resulting inthree datasets with 0%, 5% and 10% exceptions. While we storedthe orders table in a sorted way, we evaluated the impact of aPatchIndex and a JoinIndex defined on the lineitem table for queryperformance and chose a subset of three queries that includes thisjoin. We used the bitmap-based PatchIndex approach in our ex-periments, which showed in Section 6.2 to perform better thanthe identifier-based approach. Besides different exception rates, weevaluated two options for the dataset without exceptions. The firstoption uses the PatchIndex optimizations described in Section 3.3and is therefore the general case that is comparable against thereference runtime in a fair way. Nevertheless, these queries containunnecessary overhead as the constraints in the benchmark do notcontain any exceptions. In the second option we therefore enablezero-branch-pruning (ZBP), which is a technique of removing sub-trees of a query plan that are ensured to not produce any results.During query optimization query plans are annotated with cardi-nality estimations. If these estimates are ensured to be zero, e.g. byevaluating integrity constraints or indexes, the query rewriter candrop the respective subtrees from the query plan. This way, the sub-tree that would process the patches is pruned from the query plan,resulting in better performance as it drops all overhead introducedby cloning the query subtrees. Although not being the general usecase, this option can therefore be used to compare results againstthe JoinIndex runtimes.Figure 10 shows the results of the experiment for queries aswell as for update sets. In general, the exception rate has no im-pact on reference runtimes without constraint, JoinIndex runtimesor update runtimes with PatchIndexes. We can observe that thePatchIndex benefit on join queries depends on the size of the joinand the exception rate. For Q3, which contains the largest join,query runtime decreases with decreasing exception rate and im-pacts query performance in a positive way even for an exceptionrate of 10%. The same holds for Q7, which however shows worseperformance compared to the reference runtime for 10% excep-tions. As the PatchIndex impact grows with lower exception rates,PatchIndex runtimes nearly reach the JoinIndex runtimes for anexception rate of 0%. Additionally activating zero-branch-pruningreduces the overhead introduced by the Patchindex optimization, pdatable Materialization of Approximate Constraints , ,

CMP U (a) PatchIndex

CMP U (b) Mat. View

CMP U (c) SortKey

CMP U (d) JoinIndex

Figure 11: Qualitative comparison of PatchIndex against evaluated approaches in terms of Creation effort (C), Memory/Storageoverhead (M), Performance impact (P) and Updatability (U). (Higher score means “better”) leading the queries to run 43% and 40% faster compared to the refer-ence runtime. With zero-branch-puning, runtimes are also slightlyfaster than using the JoinIndex, which is a full materialization of thejoin. This is caused by a small additional scan effort in the JoinIn-dex query, as here the index is materialized in an additional tablecolumn. Query Q12 shows a different behaviour, as this alreadyshort-running query is impacted in a negative way when using aPatchIndex. In opposite to queries Q3 and Q7, the join in this queryis very small due to prior selections. Therefore, the added overheadof cloning subtrees is larger than the gained benefit of exchang-ing the HashJoin with a MergeJoin in the PatchIndex optimization.Nevertheless, enabling zero-branch-pruning leads to a performancebenefit of 28%, which is slightly faster than the JoinIndex queryagain. In terms of update support, zero-branch-pruning does nothave any impact on performance. For the insert set (inserting 0.5Mtuples) and the delete set (deleting 6M tuples) we can observe aslight performance overhead for both materialization approaches.Here the JoinIndex performs slightly better than the PatchIndex, asupdates are handled in-memory by the Positional Delta-Tree [17]structure. The creation effort for both approaches is excluded fromFigure 10 for scaling purpose. Here, creating a PatchIndex takes100 seconds, which is significantly faster than creating a JoinIndex,which takes around 600 seconds.In contrary to distinct and sort queries, this experiments showsthat using PatchIndexes in join queries is a trade-off, as the overheadof cloned subtrees might negate the benefit of using a MergeJoinfor most tuples. As described in Section 3.5, these plans would notbe chosen by the optimizer. Although the benchmark (theoretically)only contains perfect constraints, it is an example for another ad-vantage of the PatchIndex approach. Even if a dataset is clean at apoint in time, it may become unclean in the future by update oper-ations. While these updates would be aborted with the definitionof usual constraints, PatchIndexes would allow the updates and therespective transition from a perfect constraint to an approximateconstraint.

In our evaluation, we compared the generic PatchIndex approachagainst materialized views, SortKeys and JoinIndexes as specializedmaterialization approaches for different queries. Figure 11 shows aqualitative comparison of the discussed approaches, proving thatthe PatchIndex structure is an impactful compromise between not defining any constraints and the materialization of constraints. Theeffort to create a PatchIndex is in the order of materialized views,while SortKeys and JoinIndexes require significantly more timeto create. As PatchIndexes only add a single bit per tuple, theyshow a moderate memory overhead, which is only surpassed bythe SortKey, which reorders data and avoids storing additionalmetadata. For distinct, and sort queries, the PatchIndex reaches aperformance impact comparable to the specialized materializationapproaches, which even holds for quite high exception rates. Whileupdate support is a drawback of materialized views and SortKeys,the PatchIndex approach offers lightweight support for updateoperations.

In this paper, we designed the sharded bitmap structure, whichrelies on virtually splitting an ordinary bitmap to keep update op-erations as local as possible to the affected memory regions. Thedata structure provides efficient support for insert, modify anddelete operations while being scalable in size and adding only asmall memory overhead of 0.39% to the memory consumption ofan ordinary bitmap. With the sharded bitmap as the underlyingdata structure, we designed the PatchIndex structure to maintainexceptions to arbitrary constraints, which allows the definitionof approximate constraints in database systems. We described theintegration of PatchIndexes for “nearly unique columns” (NUC)and “nearly sorted columns” (NSC) into the optimization of dis-tinct, sort and join queries and provided mechanisms to efficientlymaintain these constraints in the case of table updates, which avoidexpensive index recreation and full table scans. In our evaluation,we compared the generic PatchIndex approach against material-ized views, SortKeys and JoinIndexes as specialized materializationapproaches in microbenchmarks as well as in the well-known TPC-H benchmark, proving that our approach is impactful for generalpurposes and even high exception rates.Future work includes the discussion of additional constraints likeapproximate constancy of column values or arbitrary user-definedconstraints that rely on the data semantics and their impact onquery execution. Additionally, the concept of PatchIndexes offerspossibilities for the field of approximate query processing [20], asthe PatchIndex contains information that hold for the major partof the data and therefore allows to generate approximate results , Steffen Kläbe, Kai-Uwe Sattler, and Stephan Baumann on the whole dataset. As the sharded bitmap allows to keep opera-tions local to single shards, it offers the possibility for lightweightcompression. Typically, bitmaps are compressed using run-lengthencoding, which could reduce the PatchIndex memory consump-tion especially for low exception rates. Although we presented anefficient mechanism to discover PatchIndex deltas in the case of up-dates by avoiding full table scans, further data structures like bloomfilters or index structures could, if present in the database system,enhance the discovery of exceptions to approximate constraintscaused by update operations.

ACKNOWLEDGMENTS

This work was funded by the Actian Corp.

REFERENCES [1] [n. d.]. https://github.com/Sklaebe/Approximate-Constraint-Data-Generator. ([n.d.]).[2] Ziawasch Abedjan, Lukasz Golab, and Felix Naumann. 2015. Profiling relationaldata: a survey.

The VLDB Journal

24, 4 (Aug. 2015), 557–581.[3] Manos Athanassoulis, Zheng Yan, and Stratos Idreos. 2016. UpBit: Scalable In-Memory Updatable Bitmap Indexing. In

Proceedings of the 2016 InternationalConference on Management of Data (SIGMOD) . 1319–1332.[4] Stephan Baumann, Peter A. Boncz, and Kai-Uwe Sattler. 2016. Bitwise dimensionalco-clustering for analytical workloads.

The VLDB Journal

25, 3 (June 2016), 291–316.[5] José A. Blakeley, Neil Coburn, and Per-Ake Larson. 1989. Updating derivedrelations: detecting irrelevant and autonomously computable updates.

ACMTransactions on Database Systems

14, 3 (Sept. 1989), 369–400.[6] José A. Blakeley, Per-Ake Larson, and Frank Wm. Tompa. 1986. Efficientlyupdating materialized views.

ACM SIGMOD Record

15, 2 (June 1986), 61–71.[7] José A. Blakeley and Nancy L. Martin. 1990. Join index, materialized view, andhybrid-hash join: a performance analysis. In

Proceedings. Sixth InternationalConference on Data Engineering . 256–263.[8] Peter A. Boncz, Thomas Neumann, and Orri Erling. 2014. TPC-H Analyzed:Hidden Messages and Lessons Learned from an Influential Benchmark. In

Perfor-mance Characterization and Benchmarking . Vol. 8391. 61–76.[9] Peter A. Boncz, Marcin Zukowski, and Niels Nes. 2005. MonetDB/X100: Hyper-Pipelining Query Execution. In

CIDR 2005, Second Biennial Conference on Innova-tive Data Systems Research (CIDR) . 225–237.[10] Michael J. Cahill, Uwe Röhm, and Alan D. Fekete. 2008. Serializable isolation forsnapshot databases. In

Proceedings of the ACM SIGMOD international conferenceon Management of data . 729–738.[11] Guadalupe Canahuate, Michael Gibas, and Hakan Ferhatosmanoglu. 2007. UpdateConscious Bitmap Indices. In . 15–15.[12] Michael L. Fredman. 1975. On computing the length of longest increasing subse-quences.

Discrete Mathematics

11, 1 (1975), 29 – 35.[13] Bogdan Ghita, Diego G. Tomé, and Peter A. Boncz. 2020. White-box Compres-sion: Learning and Exploiting Compact Table Representations. In

Conference onInnovative Data Systems .[14] Ashish Gupta and José A. Blakeley. 1995. Using partial information to updatematerialized views.

Information Systems

20, 8 (Dec. 1995), 641–662.[15] Stefan Hagedorn and Kai-Uwe Sattler. 2018. Cost-Based Sharing and Recycling of(Intermediate) Results in Dataflow Programs: 22nd European Conference, ADBIS,September 2–5, 2018, Proceedings. 185–199.[16] Arvid Heise, Jorge-Arnulfo Quiané-Ruiz, Ziawasch Abedjan, Anja Jentzsch, andFelix Naumann. 2013. Scalable discovery of unique column combinations.

Pro-ceedings of the VLDB Endowment

7, 4 (Dec. 2013), 301–312.[17] Sándor Héman, Marcin Zukowski, Niels J. Nes, Lefteris Sidirourgos, and Peter A.Boncz. 2010. Positional update handling in column stores. In

Proceedings of theACM SIGMOD International Conference on Management of Data . 543–554.[18] Steffen Kläbe, Kai-Uwe Sattler, and Stephan Baumann. 2020. PatchIndex - Ex-ploiting Approximate Constraints in Self-managing Databases. In

IEEE 36thInternational Conference on Data Engineering Workshops (ICDEW) . 139–146.[19] Henning Köhler, Sebastian Link, and Xiaofang Zhou. 2015. Possible and CertainSQL Keys.

Proc. VLDB Endow.

8, 11 (July 2015), 1118–1129.[20] Kaiyu Li and Guoliang Li. 2018. Approximate Query Processing: What is Newand Where to Go?

Data Science and Engineering

3, 4 (Dec. 2018), 379–397.[21] Ester Livshits, Alireza Heidari, Ihab F. Ilyas, and Benny Kimelfeld. 2020. Ap-proximate Denial Constraints.

Proceedings of the VLDB Endowment

13, 10 (2020),1682–1695. [22] Guido Moerkotte. 1998. Small Materialized Aggregates: A Light Weight IndexStructure for Data Warehousing. In

Proceedings of 24rd International Conferenceon Very Large Data Bases (VLDB) . 476–487.[23] Fabian Nagel, Peter A. Boncz, and S. D. Viglas. 2013. Recycling in pipelined queryevaluation. In .338–349.[24] Thorsten Papenbrock and Felix Naumann. 2017. A Hybrid Approach for Effi-cient Unique Column Combination Discovery. In

Datenbanksysteme für Business,Technologie und Web (BTW 2017) . 195–204.[25] Eduardo H. M. Pena, Eduardo C. de Almeida, and Felix Naumann. 2019. Discov-ery of Approximate (and Exact) Denial Constraints.

Proceedings of the VLDBEndowment

13, 3 (2019), 266–278.[26] Frank Wm. Tompa and José A. Blakeley. 1988. Maintaining materialized viewswithout accessing base data.

Information Systems

13, 4 (Jan. 1988), 393–406.[27] Patrick Valduriez. 1987. Join indices.

ACM Transactions on Database Systems

Proceedings of the Workshopon Testing Database Systems (DBTest) . 1–6.[29] Bo Wang, Heiner Litz, and David R. Cheriton. 2014. HICAMP bitmap: space-efficient updatable bitmap index for in-memory databases. In

Proceedings of theTenth International Workshop on Data Management on New Hardware (DaMoN) .1–7.[30] Ziheng Wei, Uwe Leck, and Sebastian Link. 2019. Discovery and Ranking ofEmbedded Uniqueness Constraints.

PVLDB

12, 13 (2019), 2339–2352.[31] Marcin Zukowski, Sándor Héman, Niels Nes, and Peter A. Boncz. 2006. Super-Scalar RAM-CPU Cache Compression. In