Adaptive HTAP through Elastic Resource Scheduling
Aunn Raza, Periklis Chrysogelos, Angelos Christos Anadiotis, Anastasia Ailamaki
AAdaptive HTAP through Elastic Resource Scheduling
Aunn Raza [email protected] Polytechnique Fédérale de Lausanne
Periklis Chrysogelos [email protected] Polytechnique Fédérale de Lausanne
Angelos Christos Anadiotis [email protected] Polytechnique
Anastasia Ailamaki [email protected] Polytechnique Fédérale de LausanneRAW Labs SA
ABSTRACT
Modern Hybrid Transactional/Analytical Processing (HTAP)systems use an integrated data processing engine that per-forms analytics on fresh data, which are ingested from atransactional engine. HTAP systems typically consider datafreshness at design time, and are optimized for a fixed rangeof freshness requirements, addressed at a performance costfor either OLTP or OLAP. The data freshness and the per-formance requirements of both engines, however, may varywith the workload.We approach HTAP as a scheduling problem, addressed atruntime through elastic resource management. We model anHTAP system as a set of three individual engines: an OLTP,an OLAP and a Resource and Data Exchange (RDE) engine.We devise a scheduling algorithm which traverses the HTAPdesign spectrum through elastic resource management, tomeet the data freshness requirements of the workload. Wepropose an in-memory system design which is non-intrusiveto the current state-of-art OLTP and OLAP engines, andwe use it to evaluate the performance of our approach. Ourevaluation shows that the performance benefit of our sys-tem for OLAP queries increases over time, reaching up to50% compared to static schedules for 100 query sequences,while maintaining a small, and controlled, drop in the OLTPthroughput.
CCS CONCEPTS • Information systems → DBMS engine architectures ; Main memory engines ; Database transaction processing ; Online analytical processing engines . Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies are notmade or distributed for profit or commercial advantage and that copies bearthis notice and the full citation on the first page. Copyrights for componentsof this work owned by others than ACM must be honored. Abstracting withcredit is permitted. To copy otherwise, or republish, to post on servers or toredistribute to lists, requires prior specific permission and/or a fee. Requestpermissions from [email protected].© 2020 Association for Computing Machinery.
Modern business analytics use Hybrid Transactional Analyti-cal Processing (HTAP) systems, where an Online TransactionProcessing (OLTP) engine continuously updates the stateof the database which serves Online Analytical Processing(OLAP) queries. HTAP introduces data freshness as an addi-tional dimension to analytical data processing. As data fresh-ness depends on transactional throughput, an ideal HTAPsystem provides analytical processing over fresh data with-out affecting the performance of the transactional engine.Similarly, analytical processing requires that query responsetimes remain unaffected by data freshness traffic.Unfortunately, there is no free lunch. No matter whichmechanism is used to guarantee query execution over freshdata, the performance of either the OLTP or the OLAP partof the HTAP system deteriorates. Figure 1 shows examplesof the two extremes of the HTAP design spectrum whichincur different overhead for each of the two engines. Todemonstrate the performance tradeoff, we execute the samequery 16 times and we vary the snapshotting frequencyby taking a new snapshot (after OLTP has made updates) atevery query, then every two queries, up to sixteen queries, asdepicted on the x-axis. We measure average query executiontime and transactional throughput as per values on the left-and right-hand y-axis, respectively.The experiment denoted "ETL" transfers data from theOLTP to the OLAP engine and then executes the analyticalquery. Since query execution starts after all updates havebeen applied to the OLAP engine, the end-to-end query re-sponse time is higher. Nevertheless, the cost is amortizedacross all queries executed on top of the same snapshot. Theexperiment denoted "CoW" relies on hardware-supportedCopy-on-Write to allow analytical query execution whilerunning transactions. When the OLTP engine writes a recordwhile a query is running, operating system first duplicatesthe affected page, and the write is applied on the copy, thereby a r X i v : . [ c s . D B ] A p r aza, et al. O L T P T h r o u g hpu t ( M T P S ) E x e c u t i o n T i m e ( s ) Figure 1: HTAP with ETL and CoW. The engines runon 2 different sockets of a 4-socket server. OLAP exe-cutes a batch of 16 aggregate queries and reports thetotal execution time. OLTP executes TPC-C NewOrdertransaction, with one warehouse per worker thread. ensuring consistency while query execution time remains in-tact. Copying pages, however, makes OLTP performance de-teriorate. Without concurrent OLAP queries, OLTP through-put is the same across both experiments (around 2 MTPS).Current state-of-the-art HTAP systems consider data fresh-ness requirements in their design and they can be classifiedunder two categories, based on their storage design. Thefirst category includes a unified storage, where transactionaland analytical workloads are executed concurrently usingsnapshotting mechanisms. This category is optimal for ana-lytical workloads where every query accesses further freshdata than the previous one. The second category includes adecoupled storage, where the OLTP and the OLAP part ofthe system are executed in isolation and fresh data are trans-ferred from the OLTP to the OLAP part upon request. Thiscategory is optimal for workloads with batches of queriesthat require the same level of data freshness, or when thedata that they need to access is not frequently updated.However, the amount of fresh data that a query needsto access depends on the workload [10, 12], whereas it canvary at runtime. For instance, reporting workloads that typi-cally run daily and in batches can be efficiently supportedthrough an ETL process. On the other hand, real-time statis-tics workloads require immediate access to fresh data. Finally,monitoring workloads have queries partly accessing freshdata. Therefore, a generic HTAP system needs to supportworkloads accessing different parts of the database whichare updated with different frequency, while providing fresh-ness and performance guarantees for both OLTP and OLAP.Accordingly, the HTAP system has to adapt its design to theworkload requirements, at run-time. In this paper, we referto the amount of fresh data that a query needs to access, as its data freshness requirements . Accordingly, we describethe design and implementation of an HTAP system whichsupports query workloads with variable data freshness re-quirements. Our system adapts to the freshness requirementsof the workload, while controlling interference between thetransactional and the analytical part, through a schedulingalgorithm which distributes compute and memory resourcesacross the OLTP and the OLAP engines. Our scheduling al-gorithm is enforced through a system which is non-intrusiveto the design of standard in-memory OLTP and OLAP en-gines and can traverse the HTAP system design spectrumby changing the distribution of resources between the twoengines. Accordingly, we treat HTAP primarily as a resourcescheduling problem, where the OLTP and the OLAP enginecompete for achieving maximum fresh data locality. In sum-mary, we make the following contributions: • We approach HTAP as a resource scheduling problem,and we specify a set of states that the system adaptsbased on the workload requirements for fresh data andthe availability of the compute and memory resourcesthat can be exchanged between the engines. • We show that OLAP performance benefits from get-ting compute resources of the OLTP engine to bringcomputation closer to the fresh data, but this benefit islimited by the amount of fresh data accessed and thememory bandwidth. In the long run, we demonstratethat the HTAP system converges to a state where allfresh data have to be transferred from the OLTP tothe OLAP engine. This happens when the size of freshdata dominates the size of the database and, therefore,the data transfer cost is amortized quickly. • We demonstrate that, compared to static designs, adap-tive resource scheduling increases the performance ofOLAP query execution up to 50% for 100 OLAP querysequences while maintaining a small, and controlled,drop in the OLTP throughput.
In this section, we first provide the definition and metric fordata freshness that are relevant to the HTAP problem thatwe study. Then, we give an overview of existing HTAP ap-proaches and we classify them based on their storage organi-zation, whereas we also present state-of-the-art approachesfocusing on the storage layout and in hybrid workload sched-uling, which are relevant to our work. Finally, we classifyHTAP workloads based on the expected amount of fresh datawhich are expected to be accessed during query execution.
In the HTAP design space that we study, we consider two en-gines, one OLTP and one OLAP, which can be either logically daptive HTAP through Elastic Resource Scheduling Table 1: HTAP Design Classification
HTAP Storage System Snapshot Mechanism Freshness-Perf. Trade-offHyPer-Fork [22], Caldera [5] CoW OLTP (CoW)
Unified Storage
HyPer-MVOCC [32], MemSQL, IBM BLU [36] MVCC OLAP (version traversal)SAP HANA [17] Delta-Versioning OLAP (version traversal),OLTP (record chains)BatchDB [27] Batch-ETL OLAP (ETL latency)
Decoupled Storage
Microsoft SQL Server [24] MVCC-Delta OLAP (tail-records scan)Oracle Dual-format [23] Txn Journal & ETL OLAP (tail-records scan)or physically separated. We suppose that each engine storesdata in its private storage, while allowing them to accesseach other’s data through predefined access paths.We defineas fresh , the data resulting from modifications executed bythe OLTP engine which are not present in the OLAP privatestorage when an analytical query arrives. Accordingly, thefresh data can be accessed by the OLAP engine either byfirst copying them to its private storage, or by accessing theOLTP storage directly through access paths exposed by theOLTP engine. Following the definitions in [8], we measuredata freshness with the freshness-rate metric which is definedas the rate of tuples that are the same between the privatestorage of the two engines, over the overall amount of tu-ples. Accordingly, when the two engines share the same datastorage, the freshness-rate metric will always be 1. Instead,when their storage is independent, this metric will generallybe less than 1.In HTAP, the analytical queries are executed on top of adata snapshot with freshness-rate metric equal to 1. This isachieved either by the two engines sharing the same datastorage, or by transferring the corresponding delta from thetransactional to the analytical storage before the query isexecuted. In this paper, we study the performance trade-offs of every approach on the transactional and the analyt-ical engine, and we adapt our system design based on thefreshness-rate metric, which we measure at query-level.
We classify the existing HTAP systems, based on their stor-age design, in two high level categories: (i) Unified storage,and, (ii) Decoupled storage. In unified storage, the HTAPsystem maintains a single consistent snapshot of data for an-alytical and transactional processing, and isolation betweenthe two engines is achieved through snapshotting. In decou-pled storage, the HTAP system maintains a separate storagefor analytical and transactional processing, hence replicatingand optimizing data formats while extracting data from thetransactional engine, transforming them into the appropriateformat, and loading them into the analytical engine. Table 1 shows the classification of existing HTAP systems andprovides information on the mechanism used for acquiringfresh data snapshots as well as on the trade-offs betweenperformance and data freshness.
Unified Storage.
The first version of HyPer [22] relies onpartitioned-serial execution for concurrency control amongtransactions. Analytical queries are executed on isolatedsnapshots which are taken lazily upon conflicting access be-tween the transactional and the analytical part of the system.The snapshot isolation mechanism is based on CoW and Hy-per uses UNIX fork to start a new process when an analyticalquery arrives thereby providing immediate access on thefresh data to the analytical engine. Caldera [5] is an HTAPprototype system employing GPUs for analytical query exe-cution, which also relies on CoW using page-shadowing.The most recent version of Hyper [32] relies on optimisticmulti-version concurrency control (MVOCC) to mediate ac-cess among transactional and analytical queries. Similarly,MemSQL and IBM BLU [36] are commercial HTAP Systemsthat employ MVCC for snapshot isolation and a tunable op-tion for each table to either store it as a column-major orrow-major formats. SAP HANA [17] is another commer-cial DBMS system that provides HTAP capabilities usinga variant of MVCC-based storage. SAP HANA maintains aconsistent OLAP-optimized main and OLTP-optimized deltastorage which is periodically merged into the main storage.HTAP systems with unified storage opt for analytical datafreshness using either CoW [28] or multi-versioning [37]for snapshot isolation. However, such systems provide datafreshness at the cost of performance of either or both analyt-ical and transactional workloads. Specifically, in CoW, whenthe OLTP engine updates a record, it has to do a full pagecopy first, thereby trading transactional for analytical perfor-mance. On the other hand, by using MVCC alone, the OLAPengine has to traverse the versions kept by the OLTP en-gine, trading analytical for transactional performance due torandom memory accesses. Delta-versioning is a hybrid tech-nique which maintains one OLTP- and one OLAP-optimizedsnapshot, with second being periodically updated by the first aza, et al. one. This approach is fairer since both engines lose perfor-mance for accessing each other’s snapshot, the OLTP forreading what was recently migrated to OLAP and the OLAPfor reading the recently updated data from OLTP. The trade-offs between data freshness and transactional or analyticalperformance are further analyzed in [35]. Decoupled Storage.
Traditionally, in data warehousing,the data is extracted from transactional stores, and thentransformed and loaded into analytical data stores, during anETL process. Recently, HTAP systems are challenged withmore frequent update-propagation mechanisms to providehigh data freshness rates.BatchDB [27] schedules OLTP and OLAP engines acrossisolation boundaries, for instance different NUMA nodes,and employs a mini-batching technique to propagate trans-actional logs to the analytical data store, either periodicallyor on-demand. Oracle’s dual-format [23] maintains OLTP-optimized row-major and OLAP-optimized column-majordata in-memory. Microsoft SQL Server[24] also maintainstwo copies of data and propagates data to OLAP storagethrough an intermediate delta storage to avoid overheads ofmerging transactionally-hot records, repeatedly.Decoupled storage has been the conventional way of link-ing OLTP with OLAP databases. Batch-ETL provides isola-tion between the engines, at the cost of OLAP performancefor transferring the data from the OLTP snapshot. However,this cost is amortized by the execution of query batches. Inthe MVCC-Delta and the Txn Journal & ETL approach, thecorresponding systems keep a separate snapshot for OLTPand OLAP, and they either transfer periodically the recentversions maintained by the MVCC protocol to the OLAP side,or they use the transactional journal log to do the transfer.
Storage layout.
NSM and DSM have been the de-factochoices for OLTP and OLAP engines, respectively. Whereas,PAX [3] addresses hybrid workloads by allowing row andcolumnar representation in the same disk page. For in-memoryHTAP, systems like SAP HANA [34] [17] and HyPer [22]have shown that columnar storage [2] is beneficial for OLAPand only has a negligible performance drop in OLTP. Furtherwork on data organization for hybrid workloads [16] [6] hasshown that with additional optimizations to just static dataorganization can get orders of magnitude speedup in hybridworkloads. As the scope of our work is to study and mitigatethe effect of performance interference between the OLTPand the OLAP engine, we keep the storage layout constantacross the engines and we follow the approach of HANAand HyPer maintaining a columnar layout.
Hybrid workload scheduling.
HTAP workload sched-uling can be considered as a subclass of the hybrid workloadscheduling problem, where the goal is to achieve fairness,while avoiding interference across workloads. Works in data-center and cluster scheduling [20] [15] achieve performance isolation between small and large jobs, reducing tail-latenciesand head-of-line blocking in a shared cluster. Our approachfollows the same principles by elastically trading resourcesbetween engines through an elastic HTAP scheduler whichguarantees that no engine will starve, while allowing everyindividual engine to optimize their own schedule internally.However, our focus is on scale-up servers, where the trade-offs are different than scale-out clusters, whereas the metricdriving the scheduler decision is the data freshness rate.Further, run-time resource scaling [14] and performanceisolation [29] has been also a well-studied topic in DBMSscheduling. However, existing work focuses on independentworkloads, while in HTAP, workloads have data dependen-cies due to the data freshness requirements, which lead toconcurrent data accesses by the OLTP and the OLAP en-gine, thus resulting in interferencedue to implicit (memorybandwidth, CPU caches, hyper-threads) and explicit (lock-ing/latching) resource sharing caused by OLTP and OLAP’sconcurrent data accesses.Furthermore, works focusing on scheduling in combina-tion with HTAP traditionally consider the location of thedata as input to decide the task placement and, then, the dataaccess method. For instance, the work of Dziedzic et al. [16]distinguishes and optimizes hybrid workload by adaptingthe access methods, that are, B+-tree, column-store indexor hybrid, on runtime. In our approach, the location of thedata is one of the outputs of the scheduling algorithm, whichdecides whether fresh data should be moved from the OLTPto the OLAP engine, or accessed remotely. Then, based onresource availability, the scheduler decides how to access thedata given the amount of fresh data required by every query.As we consider an independent storage for each engine, wecan physically isolate them and then control interference atthe CPU and the memory level by changing the distributionof CPUs across the engines and the data access methods.
The amount of fresh data that an OLAP query needs to access,affects the performance of the OLAP and the OLTP enginedepending on the design of the HTAP system. This sectionprovides a classification of HTAP workloads based on theirdata freshness requirements, following prior works [10, 12].
Short and fresh.
This class includes analytical workloadswith a high rate of incoming queries. The queries are sim-ple, they require fresh data and are mutually independentacross the query stream. Examples of short and fresh analyt-ical workload include dashboard applications where queriesaccess only the latest tail records inserted. Therefore, therespective queries can be efficiently processed by unifiedstorage engines, since queries access only a part of the data,and they require maximum freshness. daptive HTAP through Elastic Resource Scheduling Query batches.
This class includes recurring and mostlypre-defined queries which are predictable and arrive as a sin-gle batch. The queries require high, and uniform across allqueries in the batch, data freshness, since the results reflectthe state of the database until a certain point in time. A stan-dard example is the reporting queries generated periodicallyfor the state of the sales of a business unit or the state ofan area which is under electronic access control. Therefore,the respective queries can be efficiently processed by decou-pled storage engines, since queries need to access exactlythe same data, which have to be as fresh as possible.
Ad-hoc queries.
This class includes dynamic queries re-quiring access to fresh data, whereas they can combine bothfresh and old data. Queries for predictions and forecasting re-quire such combination of historical and fresh data, and theyare typical examples falling into this category. Therefore, theperformance of an HTAP system is in this case by the amountof fresh data that each query will access. Accordingly, in casea query needs to access mostly fresh data, a unified storageengine is the most appropriate choice. Instead, queries thataccess mostly historical data, would benefit mostly by a de-coupled storage design. However, this information is onlyknown at run-time, and therefore the system has to adapt tothe size of fresh data accessed by each quer.y
This Section describes the design of our HTAP system. As weconsider two independent engines that can work togetherthrough a thin, scheduling layer, we split our description intofour parts. First, we provide an overview of the overall sys-tem to give the big picture and link with the design choicesof each individual engine. Then, we provide the technicaldescription of the OLTP and the OLAP engines and showthat our design is non-intrusive to the standard design ofsuch in-memory engines. Finally, we explain how data andresources are exchanged between the OLTP and the OLAPengine through the Resource and Data Exchange engine.
Hybrid Transactional/Analytical Processing addresses work-loads where analytical queries process fresh data. In the pres-ence of both the OLTP and the OLAP engine, data freshnessis guaranteed either by sharing the storage of the OLTP en-gine or by copying a part of it to the OLAP engine. There aretwo important challenges associated with achieving queryexecution over fresh data: (i) explicit resource sharing causesinterference both at the software (extensive copy-on-writeor non-contiguous scans) and at the hardware level (sharingCPUs and memory bus), and, (ii) data copying imposes an increased latency which may not be tolerable by the work-load. We address these challenges through an elastic systemdesign which adapts to the workload requirements.Figure 2 depicts the architecture of our HTAP system.Our system achieves resource isolation and sharing on de-mand, with respect to the performance requirements of eachengine. OLTP and OLAP engines execute the workload inde-pendently, and have one-way dependency: only the OLAPengine reads fresh-data from the OLTP. As sessions do notspan across the two engines, each engine maintains its ownrequest queue and executes the requests independently. TheRDE engine assigns resources (CPU and memory) and data tothe engines. Following the common approach in cloud com-puting, we assume that CPU and memory resources are splitin two sets: the first is exclusively given to each engine, andthe second can be traded between them. The distribution ofresources between the engines is decided by the RDE engine.By introducing RDE as an integration layer, we achieve anadaptive HTAP system design, which is minimally intrusiveto the design of existing OLTP and OLAP engines.
The OLTP engine is depicted on the left-hand side of Figure 2.Following the standard in-memory OLTP system design,our engine includes a Storage Manager (SM), a TransactionManager (TM) and a Worker pool Manager (WM).
Storage Manager.
The SM stores data entirely in the mainmemory in columnar format. Following an approach similarto Twin Blocks [9] and Twin Tuples [28], the SM maintainstwo instances of the data, in addition to a multi-versionedstorage. The difference from existing approaches is that eachinstance keeps data in a columnar layout, to allow the OLAPengine to perform fast scans over the data without having totraverse specific versions. At every time point, only one ofthe two instances is active. The OLTP engine also maintainsan index, implemented using cuckoo hashing [33]. The indexalways points to the last updated record in either of the twoinstances. Therefore, even though the inactive instance willremain out-of-date, upon switch, the data will always comefrom the newest version. The OLTP engine also maintains adelta storage to allow transactions to traverse older versionsof the objects in Newest-to-Oldest ordering [37], followingthe standard multi-versioned concurrency control (MVCC)process. By maintaining two explicit instances, we split thestorage into a part which is contiguous and another onewhere data are stored in random order. This way, in case thereare at least two versions of the data, we do not impose anyfurther requirements on memory size. The active instance ofthe SM is switched upon request. The SM provides an APIto switch the instance, which returns the starting address ofthe inactive instance when no active OLTP worker thread aza, et al. Index OLTP Instance AB D a t a E xc hange C P U C P U C P U C P U C P U C P U C P U C P U C P U C P U C P U C P U C P U C P U C P U C P U C P U C P U C P U C P U Interconnect
Resource Exchange
Figure 2: System architecture is using it any more. The SM maintains instance statisticsper column, which are the number of records at the timeof switch, a flag indicating if the column contains updated-tuples and the epoch number. The SM also maintains an update indication bit for each record, which is set when therecord gets updated. Access to the update indication bitsis synchronized using atomic operations, as they can beaccessed by different parts of the system.
Transaction Manager.
Together with the SM, the TMprovides transactional access to the database records by rely-ing on a two-phase locking (MV2PL [37]) concurrency con-trol protocol with deadlock avoidance [4] and transactionalisolation level of snapshot isolation [7]. When a transactiontries to access a record, it brings from the index its most re-cent value. This value may either be on the currently activeinstance or on the inactive one. When a transaction starts,it requests the SM for the starting address of the columnwhich is on the currently active instance, and performs allthe operations there, sets the update indication bit for thatrecord, and then updates the index, if needed. Every updateis directly placed on the active instance upon transactioncommit, and the older version is pushed to the versioned stor-age. This provides a significant speedup to the OLAP queryexecution since it allows sequential scans. Inserts are pushedto both instances, but they are made available through theinactive instance by the Storage Manager only after a switch.
Worker Manager.
The OLTP engine uses one hardwarethread per transaction. The WM keeps a worker pool ofactive threads. We set each thread to first generate a transac-tion and then execute it, simulating a full transaction queue.The WM exposes an API to set the number of active workerthreads and their CPU affinities, thus enabling the OLTPengine can elastically scale up and down upon request.
The OLAP engine is based on Proteus [11, 21], a parallel,NUMA-aware, query execution engine with support for CPU-GPU execution [11]. The engine uses code generation tospecialize the executed code to each query [31], computedevice [11] and access path [21]. Our design considers CPU as the only compute device available, leaving hardware ac-celerators as future work. Following the description of theOLTP engine, we describe how the Storage Manager (SM)and Query Executor (QE) of the OLAP engine, work. TheOLAP engine also includes a Worker Manager (WM), whichworks in a similar way to the WM of the OLTP engine, andtherefore, we omit its description to avoid redundancy.
Storage Manager.
The SM considers that data are storedin the main-memory of a single server and it is agnostic oftheir format and layout. The data access paths are decidedby input plugins which specify how tuples are accessed andinterpreted, following the schema of the database and thecurrent data format. The input plugins further encapsulatedifferent data access methods. In our HTAP setting, we usetwo access methods. The first method considers that dataare stored in the same contiguous memory area. The sec-ond method considers that data are partitioned in several(contiguous) memory areas, and it is useful when we need toaccess only the fresh data from the OLTP storage and the restfrom the OLAP storage. The SM accepts as input a pointerto the memory areas where the data are stored at executiontime, and it does not load any data beforehand.
Query Executor.
The OLAP engine generates just-in-time specialized code for each query, using the appropri-ate plugins to access the data. The query plan is translatedinto pipelines which execute sequences of operations on in-put tuples without materializing intermediate results. Eachpipeline is transformed into code which is compiled and op-timized for the current hardware architecture. By default,the pipelines process one block of tuples at a time. Duringcode generation, the plugins specialize the different accessmethods, based on the location of the data.The OLAP engine parallelizes query execution by routingblocks between different pipelines that execute concurrently.Each pipeline is assigned to a worker which is affinitized to aCPU core. Based on the placement of the data, the OLAP en-gine balances the load across worker threads using protocols(hash-based, load-aware, locality-aware and combinations).By default, the OLAP engine uses locality-and-load-aware daptive HTAP through Elastic Resource Scheduling policies, and schedules blocks to pipelines which are exe-cuted locally to the data, if possible. When a data block isremote to its worker thread, the OLAP engine either accessesthe data over the interconnect, or it prefetches the block tothe local CPU socket, while overlapping the data transferwith the execution over other blocks. The engine choosesthe best strategy based on the availability of resources. The Resource and Data Exchange engine is the integrationlayer between the OLTP and the OLAP engines and supportsthe operations required for HTAP. Fundamentally, there aretwo ways to achieve HTAP and they require two differentdesign choices. The first way is to have the two enginesrunning in isolation, which we assume to be at the socketboundary in a single, scale-up server. HTAP is thus supportedby transferring fresh data from the OLTP to the OLAP be-fore executing a query. The second way is to have the twoengines sharing their resources, effectively having a singleHTAP system with transactional and analytical processingcapabilities. Our HTAP system design relies on elastic re-source management to traverse the design space betweenthe above two approaches, through the RDE engine.In the following, we first provide fundamentals for theRDE engine design, then we describe the discrete states thesystem marking them as S (Co-located OLTP and OLAP), S (Isolated OLTP and OLAP), and S (Hybrid), and finally weexplain how the RDE engine elastically migrates betweenstates. The decision on which state to move at each pointin time is taken by the system scheduler and enforced bythe RDE engine. Our system design is independent of thescheduling algorithm, and therefore, we leave the descriptionof the latter to the next Section and here we focus only onthe way our system can adapt to different configurationsthat are required by different HTAP workloads.The RDE engine is the owner of memory and CPU re-sources and distributes them to the OLTP and the OLAP en-gines. Still, each engine has its own internal scheduler thatdecides which resources to use based on the workload. Forinstance, the scheduler of our OLAP engine makes NUMA-aware decisions on the use of multiple nodes for each queryoperator. Resources that are not accepted by an engine arereturned to the RDE and offered to the other engine. OLTP active instance switching.
The OLTP engine pro-vides consistent and fresh data with with snapshot isolationguarantees. To avoid interference with transaction executionevery time the OLAP engine needs access to fresh data, theRDE engine instructs the OLTP engine to switch its active in-stance. However, this creates a freshness-level inconsistencybetween the two instances, which can increase with time.For instance, if some records get updated every two OLTP instance switches, then they will be fresh in one instance,leaving the other one behind. Upon the instance switch,the RDE engine checks for updates through a hierarchicalupdate-presence flag in the order of schema, relations, andcolumns. In the presence of updates, the RDE engine tra-verses the update indication bit and for the records that areupdated, it copies them to the other instance, in case theyhave not been updated there as well by that time. As thenumber of inserts and updates corresponds to the data fresh-ness rate, the RDE also maintains these statistics which areprovided to the scheduler. With careful engineering, this pro-cess has a negligible effect in performance, and it is followedevery time we refer in the following to switching the activeinstance of the OLTP engine. It takes around 10ms to syncaround 1 million modified tuples in a database of over 1.8 bil-lion records, while executing TPC-C NewOrder transaction.The size of database corresponds to TPC-H scale-factor 300.
Co-located OLTP and OLAP [ S ] . In this state, the twoengines share the memory and the CPUs of all the sockets.The CPUs of each socket are distributed to the engines fol-lowing the decision made by the scheduler. When an OLAPquery arrives, the RDE engine instructs the OLTP engineto switch its active instance. The OLTP engine returns thepointer of its inactive instance, which is then used by theOLAP engine to execute the query. This way, the two enginesinterfere at the hardware, but not at the software level, sincethe OLAP query is executed on a part of the memory whichis not used by the OLTP engine. The OLTP engine continuestransaction execution on its own instance of the data.An example of the distribution of resources is depicted inFigure 2, on the left-hand socket, where black colored CPUsare used for OLTP and the stripped ones are used for OLAPquery execution. The black colored CPUs access only OLTPInstance
Related systems to [ S ] . This state represents the classof systems which employ co-location of compute and storageresources, like SAP HANA [17] and HyPer-MVOCC [32]. Co-location of hybrid workloads also represents CoW based sys-tems, since the OLAP engine gets fresh snapshots instantlywhile OLTP still proceeds on secondary data instance. Ourdesign avoids the CoW overheads for OLTP while providingOLAP access to data stored in columnar layout.
Isolated OLTP and OLAP [ S ] . In this case, the two en-gines run in the highest isolation level with minimal inter-ference. Each engine receives resources at the granularity of aza, et al. a CPU socket, following the decision made by the scheduler.When an OLAP query arrives, the RDE engine instructs theOLTP engine to switch its active instance. After the OLTPengine returns the pointer to the inactive instance, the RDEengine transfers the data that have been inserted and the datathat have been updated since the last time that the instanceof the OLAP engine was updated. Updated data are recog-nized by the update indication bit, which is set by the OLTPengine. For each record transferred to the OLAP engine, theRDE engine clears the corresponding bit. Data transfers areoverlapped with OLTP instance synchronization to avoid re-reading the same records. Even though CPU-level isolation isachieved throughout the whole query execution time, thereis interference at the memory level when reading the datafrom the socket where the OLTP engine is executed. This in-terference is limited by the interconnect bandwidth, which istypically several times smaller than the memory bandwidth,whereas the OLTP engine in any case does not fully utilizememory bandwidth due to random memory accesses.An example of the distribution of resources to the enginesis shown in Figure 2, considering that the OLTP engine oc-cupies the full left-hand socket and the OLAP engine the fullright-hand side socket with the long dashed line indicatingthe memory isolation boundary. Supposing that OLTP In-stance Related systems to [ S ] . This state represents the classof systems with decoupled storage and full compute iso-lation, like BatchDB [27] and traditional data-warehousesolutions, where a periodic ETL is performed from OLTPto OLAP engines. These systems isolate OLTP and OLAPworkloads across hardware boundaries, NUMA or machines,and provide software and hardware level isolation.
Hybrid OLTP and OLAP [ S ] . In this case, the two en-gines share memory and, if requested, CPU resources. Thekey aspects of the hybrid approach are: (i) the OLAP en-gine accesses only the fresh data that it needs for a specificquery, and, (ii) the OLAP engine accesses fresh data eitherthrough the interconnect or directly from the socket wherethe OLTP engine is executed, by taking some CPUs fromthe OLTP engine. Similarly to the previous states, when anOLAP query arrives, the RDE engine instructs the OLTP toswitch its active instance and passes the pointer to the inac-tive instance to the OLAP engine. Then, the OLAP enginehas two options: it either accesses the fresh data through theinterconnect, like in the isolated state, or it gets some CPUs on the socket of the OLTP engine and accesses fresh in fullmemory bandwidth from these CPUs. The scheduler decideswhich of the two options to be used. The interference causedby the hybrid approach at the memory level is bounded atthe lower side by the interconnect bandwidth, whereas atthe CPU level is bounded at the higher side by the numberof CPUs allowed to be passed from the OLTP to the OLAPengine, by the database administrator.An example of the hybrid approach can be reconstructedfrom Figure 2, by considering cases A and B referring to thefirst and the second option, respectively. In case A, the OLAPengine uses its own socket and accesses fresh data from theinterconnect, as the long-dashed-line arrow indicates. In caseB, the OLAP engine uses the stripped CPUs from the socketof the OLTP engine, and after performing some operationson that socket, they send the data back to the main OLAPsocket, as the short-dashed-line arrow indicates. Case B isparticularly useful for query operators with a big reductionfactor (e.g., an aggregation like a count), which would stressthe interconnect if case A would be followed.
Related systems to [ S ] . This state represents the classof systems employing hybrid data access techniques, like tailOLTP record scan for OLAP, which is equivalent to accessingfresh data from the OLTP inactive instance in our case, inaddition to scanning OLAP-local storage. Representativesystems includes Microsoft SQL Server [24], and Oracle Dual-format [23]. Further, state S follows the design of elasticresource allocation in cloud systems to distribute resourcesbetween the two engines at runtime. The elasticity at the system design level is driven by a sched-uler which decides how to distribute resources to the OLTPand the OLAP engine. The main parameter considered bythe scheduler is the data freshness rate for each query. Ac-cordingly, the scheduler selects a system state and the OLAPengine adapts its resource allocation and data access meth-ods to provide maximum data freshness for each analyticalquery. As execution ranges across three states: co-location,isolation, and hybrid, the scheduler requests the system tomigrate to a state by changing the size of the worker pool ofeach engine and the affinity of the worker threads. Accord-ingly, workload execution adapts to the resources that aremade available to the engines every time.
We model the HTAP system as a set of memory and comput-ing resources and we assign them to the engines, which usethem internally to optimize their execution. We rely on thescan operators of the analytical query plan to find the datathat the query will access and we extract the fraction of fresh daptive HTAP through Elastic Resource Scheduling data. Further, we assume that data are stored in a columnarlayout and that there no additional overheads to scanningother than any potential NUMA effects that are caused byusing CPUs that are remote to the data. Observe that NUMAoverheads are fully controllable by the scheduler, since itdistributes the resources to the engines. Finally, we assumethat the database administrator can set restrictions on theamount of resources that can be revoked from the OLTP andthe OLAP engine, to abide with performance guarantees andcontrol interference between the engines.We do not consider any optimizations through indicesor related data structures and we assume that the systemwill perform a full scan of each column. In the presenceof such optimizations, the HTAP system will still need toschedule the maintenance of the associated data structuresconsidering the data updates. Even though we expect thatour scheduler will still be applicable, we leave this as futurework which can rely on our current findings.The main overheads that we consider in the execution arethe remote memory accesses, since they can affect signifi-cantly execution times, especially in OLAP engines that haveto process big amounts of data. Given that state-of-the-artanalytical query engines can saturate the memory bandwidthwhile scanning the data, we can quantify the overhead forremote vs local memory access to be equal to the differencein bandwidth between the main memory bus and the CPUinterconnect. For OLTP engines, which are characterizedby random data accesses and therefore use only a part ofthe memory bandwidth, we assume that the overhead forremote data access is less than the analytical ones. Giventhat the profile of the transactions typically does not changeover time, we assume that the scheduler can easily learnand experimentally quantify this overhead and adjust it atrun-time, if the profile of the transactional workload changes.Finally, both engines can scale as they use more CPU coreseven from a socket which is remote to the data, despite theinterconnect bottleneck, which leads to lower performance.The scheduler makes decisions in two levels. The firstlevel is the state selection. The scheduler selects the statethat optimizes access locality to the fresh data required by thequery. The second level is the resource distribution. Memory-wise, in the co-located state, the OLTP and the OLAP enginetake one of the OLTP instances each. In the isolated case, theOLTP and the OLAP keep their own instances. In the hybridcase, the OLTP keeps one of its active instances and sharesthe second with OLAP. Computing-wise, in the co-locatedstate, the OLTP and the OLAP share at maximum all CPUsockets. In the isolated case, there is no change in the CPUdistribution. In the hybrid case, the OLAP engine may use anumber of CPU cores that primarily belonged to the OLTPengine and interfere with its execution. The thresholds on thenumber of CPU cores to exchange between the two engines are set by the database administrator, as they affect workloadexecution in both engines, and therefore resource allocationhas to remain compliant to any performance guarantees. The adaptivity of our HTAP system is achieved throughfine-grained, elastic resource scheduling, which allows thesystem to migrate across different states. Elasticity allowsthe RDE engine to provide different data-access paths tothe OLAP engine to achieve maximum data freshness, whilehaving controlled interference in OLTP performance. Thissection first describes how state migration is achieved bysetting the CPU and memory resources of the OLTP and theOLAP engine. Then, it describes how data freshness drivesthe decision for migrating across states.
Algorithm 1:
State Migration
Data:
OLTPSockThres = Minimum OLTP Sockets
Data:
OLTPCpuThres = Minimum OLTP CPUs/Socket MigrateState S () for s in Server.CpuSockets do while OLTP.cpuCnt < OLTPCPUThres[s] do OLTP.addCPU(s.nextCPU) OLAP.addCPU(s.nextCPU) OLAP.setMem(OLTP.switchInstance()) MigrateState S () for s in Server.CpuSockets do if OLTP.socketCnt < OLTPSockThres then OLTP.addSocket(s) else OLAP.addSocket(s) OLAP.etl(OLTP.switchInstance())OLAP.setMem(OLAP.localInstance()) MigrateState S ( mode ) if mode == ISOLATED then if OLTP.socketCnt < OLTPSockThres then OLTP.addSocket(s) else OLAP.addSocket(s) OLAP.setMem(OLTP.switchInstance()) else for s in Server.CpuSockets do while OLTP.cpuCnt < OLTPCPUThres[s] do OLTP.addCPU(s.nextCPU) OLAP.addCPU(s.nextCPU) OLAP.setMem(OLTP.switchInstance())Algorithm 1 describes the steps required to migrate thesystem to one of its states. First, we define two thresholds aza, et al. for the minimum computing resources that have to be givento the OLTP engine, at socket and CPU granularity. Thethresholds are useful in two states: (i) the co-located one,where the OLTP and the OLAP share some or all the socketsof the server, and, (ii) the hybrid-elastic one, where the OLAPengine uses a set of CPUs that belonged to the OLTP engine.Then, we provide each state migration as a separate function.For each function, the scheduler only assigns resources; theirenforcement is performed by the RDE engine.The MigrateState S function assigns the number of CPUsdefined by the corresponding threshold to the OLTP engine,and the rest to the OLAP engine. Then, it switches the ac-tive instance of the OLTP engine, and sets the OLAP engineto read data from the other, now inactive, OLTP instance.The MigrateState S function distributes the system CPUsockets to the OLTP and the OLAP engine according to thepolicy set by the database administrator. For instance, inthe case of a uniform policy, OLTP and OLAP engines willget half of the available sockets. Then, the scheduler marksthe memory for the ETL and requests OLAP to use its localinstance. As ETL is performed by the RDE engine, the latteruses OLAP compute resources while transferring the data,given that OLAP cannot execute the query before all datahave been copied. The ETL process triggered copies onlythe delta between the two instances, which is calculated bythe RDE engine based on the update indication bits set bythe OLTP engine. The MigrateState S function is similarto the other two, depending on the thresholds set for OLTPresource allocations. In the ISOLATED mode, we set the com-pute resources to socket-level isolation, and the OLAP engineaccesses records from the OLTP engine over the interconnect.In the
NON-ISOLATED mode, the OLAP engine is co-locatedin some sockets with the OLTP engine by getting some ofits CPU cores, thus favoring OLAP over OLTP performance.Algorithm 2 describes the scheduling strategy for migrat-ing the system across different states. The decisions of thealgorithm are based on the freshness-rate metric for everyquery. Recall that the freshness-rate metric in our HTAP sys-tem is the rate of tuples that are the same in the OLAP andthe currently active OLTP instance when the query arrives.Algorithm 2 calculates the freshness-rate metric only for thecolumns which will be accessed by every query. The sched-uler retrieves from the RDE engine the amount of fresh datathat the OLAP engine needs to fetch from the OLTP instanceto satisfy the current query N f q with freshness-rate 1 and theamount of fresh data to update the whole OLAP instance N f t .The parameter α is defined within [ , ] and as it decreases,the scheduler prefers to do ETL by migrating to S . If ETL isnot preferred due to the amount of fresh data, the schedulerwill check whether elasticity is allowed, denoted with theflag F el . If it is not allowed, then it will instruct the OLAPengine to read the data needed for the query remotely from Algorithm 2:
Freshness-driven resource scheduling
Data: F el = Elasticity availability flag Data: M el = Elasticity mode: {Hybrid, Co-location} Data: N f q = Amount of fresh data in query Data: N f t = Amount of fresh data in database Data: α = ETL sensitivity ResourceSchedule() if N f q < α N f t AND !QueryBatch then if ! F el then MigrateState S ( ISOLATED ) else if M el == HY BRID then MigrateState S ( NON-ISOLATED ) else MigrateState S () else MigrateState S () the OLTP instance, after migrating to state S − ISOLAT ED ( S − IS ) . If elasticity is allowed, then the system will eithermigrate to the state S − NON − ISOLAT ED ( S − N I ) orto S , depending on the performance requirements of theOLTP engine. Therefore, the decision on the elasticity mode M el is based on the service level agreement for the OLTPengine. The number of CPU cores to be passed from theOLTP to the OLAP engine is subject to the workload and theOLTP performance requirements. We conduct a sensitivityanalysis to explain the numbers that we are considering forour evaluation in Section 5.2.Given that the performance of an HTAP system is deter-mined by the performance of both the OLTP and the OLAPengine, Algorithm 2 is a heuristic which tries to optimizethe performance of OLAP given the restrictions of the OLTPengine. For this reason, it first favors for OLAP to take com-pute resources from OLTP ( S − N I ), then to trade them withthe OLTP ( S ) and finally to just do remote access ( S − IS ).In all cases, when there is enough fresh data, as defined by α , the algorithm migrates to S to keep the OLAP instancefresh and provide data locality for future queries. Query Batch.
We consider as batch, a set of queries thatare executed over the same data snapshot with the samefreshness rate. Thus, the execution of the batch depends onlyon the OLAP engine and it is orthogonal to the scheduler. Asthe number of queries is increased, so does the probabilityof them accessing all the fresh data of the OLTP instance,making N f q to approach N f t and leading the scheduler tomigrate to S . This also applies for individual queries, wherethe scheduler is expected to initially trigger states S and S which do not update the OLAP instance, but at some pointthe rate of fresh data per query to the overall amount of freshdata will approach 1, eventually migrating to S . daptive HTAP through Elastic Resource Scheduling ETL Sensitivity.
ETL sensitivity is given by the param-eter α in Algorithm 2 and it represents the threshold forcopying the fresh data from the OLTP to the OLAP instance.Small values of α increase the sensitivity of the schedulerinto performing an ETL by migrating to state S . This is ben-eficial for workloads where every query is expected to touchthe same attributes as the previous ones, or workloads whereonly a small fraction of the data gets updated. Instead, bigvalues of α are beneficial for workloads where every queryis expected to access a small subset of the updated data. Elasticity and Interference.
Elasticity introduces inter-ference between the OLTP and the OLAP engine. Bandwidth-intensive OLAP can starve OLTP with only a few hardwarethreads by consuming memory bandwidth apart from thecompute resources. Limits in the use of CPUs and memorybandwidth can be set by using hardware tools of server-grade CPUs [1] or software-based solutions [25, 30]. In [ ? ],we analyze the effect of elasticity on interference betweenthe workloads. To better utilize the hardware resources, real-time performance monitoring can be employed [18, 19, 26],as they will allow the scheduler to distribute resources be-tween the engines until a certain performance degradationthreshold. This section includes the results of our experimental evalua-tion. First, we describe the hardware that we used to executeour experiments, some essential details of our software, andfinally the benchmark that we used to derive our workload.Then, we present the results of our sensitivity analysis focus-ing on specific queries, to show the benefits of every state ofour system. Based on the sensitivity analysis, we tune ourscheduler and we evaluate the performance of every state ofthe system under different HTAP queries.Finally, we derivea query mix and we evaluate the adaptivity properties of ourscheduler throughout the workload execution.To the best of our knowledge, there is no single perfor-mance metric, which is defined for HTAP workload execu-tion. Accordingly, in our evaluation, we focus on the way thatour system selects the optimal configuration, based on thelevel of interference which is allowed between the engines.
Hardware.
All the experiments were conducted on a serverequipped with 2x14-core Intel Xeon Gold 6132 processor (32-KB L1I + 32-KB L1D cache, 1024-KB L2 cache, and 19.25-MBLLC) clocked at 2.60 GHz, with Hyper Threads, summing toa total of 56 hardware threads, and 1.5-TB of DRAM.
Software.
Our RDE engine relies on 2-MB huge pages,and it pre-faults the memory before passing it to the OLTP and the OLAP engines to avoid any artifacts of memory al-locations in the experimental results. At system bootstrap,the OLTP and the OLAP engines get one CPU socket of theserver, which corresponds to the full isolation state ( S ). TheOLTP engine uses the memory given by the RDE engineto create the two instances, its delta storage, its index andthe rest of the data structures required. Similarly, the OLAPengine uses the memory granted by the RDE engine to cre-ate the OLAP instance and initialize its buffers that will beneeded for query execution. Before every experiment, weexecute a warm-up phase, and then we report steady-stateanalytical query response time and transactional throughput. Benchmark.
We performed our experiments with theCH-benchmark [13] which combines two industry standardbenchmarks, TPC-C and TPC-H, for transactional and ana-lytical processing systems, respectively. The schema inheritsthe relations specified in TPC-C and adds three more re-lations specified in TPC-H, which are
Supplier, Nation and
Region . To better analyze the effects of different HTAPschedules, we scale the database size following the TPC-Happroach by a scale factor SF and the size of the LineItem ta-ble becomes SF ∗ , , OrderLines per
Order when initializing the database and we scale the number ofrecords in
OrderLine to SF ∗ , , NewOrder transactiongenerates five to fifteen order lines per order. Unless statedotherwise, all experiments are conducted on initial databasewith scale factor 300. For the transactional workload, we as-sign one warehouse to every worker thread, which generatesand executes transactions simulating complete transactionalqueue. As the CH benchmark does not specify selectivitiesfor conditions on dates, we consider 100% selectivity, whichis the worst case for join and groupby operations. Never-theless, the selectivities do not affect the performance of ourscheduling algorithm, since it assumes that the OLAP enginewill perform full column scans in any case, as we explainin Section 3.3. Therefore, the amount of fresh data accesseddoes not depend on the selectivity.
In this section, we analyze the design states of our HTAPsystem and their impact on the performance of the OLTP andthe OLAP engine. We use Q1 and Q6, which perform aggre-gate operations over the
OrderLine table, for two reasons:(i) they are simple queries that perform a scan over a singletable, and therefore they can reveal overheads and opportuni-ties, and, (ii) the
OrderLine table grows in time, as the OLTPengine executes the
NewOrder transaction. Thus, we studythe impact of data freshness-rate on the performance of bothengines, at the different states that our system migrates. Wereport the OLAP performance with query response time and aza, et al. Q u e r y R e s p . T i m e ( s ) O L T P T pu t ( M T P S ) (a) OLTP/OLAP performance at State S O L T P T pu t ( M T P S ) T o t a l Q u e r y T i m e ( s ) Query Batch SizeQuery exec Data Transfer OLTP (b) OLTP/OLAP performance at State S Q u e r y R e s p . T i m e ( s ) O L T P T pu t ( M T P S ) (c) OLTP/OLAP performance at State S N I
Figure 3: Sensitivity analysis for states S1, S2, S3-NI the OLTP performance with throughput measured in milliontransactions per second (MTPS).
Co-located OLTP and OLAP [ S ] . In this state, we focuson the impact of resource allocation in the performance ofeach engine, due to hardware-level interference which iscaused due to sharing CPU caches as well as memory andinterconnect bandwidth. We execute an experiment whichconsiders that the engines are initially in full isolation (State S ), and then they gradually trade CPUs from their socketsuntil they go half the way to each socket. Both engines accessthe memory allocated by the OLTP engine, even thougha different instance. Therefore, initially the OLAP engineaccess all records from the remote socket, and it graduallygets local access as it trades its CPUs with the OLTP engine.Instead, the OLTP engine initially has local access to its data,and gradually mixes it with remote accesses.Figure 3(a) shows the performance of the OLTP and theOLAP engine. The x-axis shows the number of CPU corestraded between the engines. The y-axis on the left-hand sideof the figure shows the performance of the OLTP engine(higher is better). The y-axis on the right-hand side of thefigure shows the query response time (lower is better). Weuse striped bars to represent the transactional throughputwhen OLTP runs without any interference from OLAP andfilled bars when OLAP and OLTP are executed concurrentlyto depict the effect of analytical query execution to the per-formance of transaction execution for all configurations. Weuse a line to represent the query response time. For everyconfiguration that we report, we run a batch of experimentswhere Q6 is executed 16 times, one after the other, on thefreshest snapshot of data, and we report the average perfor-mance, thereby considering that the amount of fresh data inthe system increases.As shown in the figure, the OLTP throughput drops up to37% in the absence of OLAP workload execution, but afterthe first 4 CPUs, the rate is smaller and almost stabilizes. Thishappens because the transaction execution is dominated by random memory accesses and, therefore, the interconnectdoes not impose much overhead. A part of this overhead isalso accounted to cross-socket atomics, as has been reportedin the OLTP literature [4]. The OLTP throughput drops up to55% in the presence of OLAP workload execution, whereasthe rate of performance degradation is almost proportionalfor each set of CPUs that we trade. This is accounted to thestress caused to the memory and the interconnect bandwidthby the OLAP query, which is a scan and dominates all mem-ory accesses. As the OLTP moves gradually to the remotesocket, it performs more accesses through the CPU intercon-nect which is saturated by the OLAP engine, and therefore itsperformance drops following this pattern. Accordingly, weobserve that the OLTP engine throughput drops about 20%due to the interference caused by OLAP workload execution.Similar to the OLTP throughput, we observe that the OLAPresponse time improves up to the point where we trade 4CPUs, where it starts stabilizing. The reason is that the en-gine efficiently load balances across the two different in-terconnects, memory bus and remote CPU, and pipelinesexecution in such a way that it does not need further datalocality. Therefore, after 4 CPUs, the OLAP performance,while the OLTP performance keeps decreasing. Insight [ S ] . The co-location of OLTP and OLAP enginecreates interference which mostly affects the performanceof the OLTP engine. Furthermore, passing many data-localCPUs to the OLAP engine does not improve its performance,whereas it continues to hurt the performance of the OLTPengine. Therefore, a bad the decision of the topology of eachengine may hurt performance with no benefit.
Isolated OLTP and OLAP [ S ] . In this state, we focus onthe time required to copy the data between the engines, andhow fast this cost can be amortized, by executing severalqueries over the same data on the OLAP engine. We executeQ6 on the OLAP engine in different batches over the samesnapshot of the data. We vary the batch size from to 1 to 16and we transfer the fresh data from the OLTP engine before daptive HTAP through Elastic Resource Scheduling Q u e r y R e s p o n s e T i m e ( s e c ) % of fresh data in database accessed by queryS3-IS (Split-Access) S2 S3-IS (Full Remote) Figure 4: OLAP response time with respect to datafreshness the query execution starts. We execute the same number ofqueries, regardless the batch size to make a fair comparison.Figure 3(b) depicts the performance of the OLTP and theOLAP engine as we increase the batch size. The x-axis showsthe number of queries contained in each batch. The y-axison the left-hand side of the figure shows the cumulativequery execution time in seconds (lower is better), and on theright-hand side the throughput of the OLTP engine (higheris better). We use solid bars to represent the response timefor 16 query executions and stripped bars to represent thetime spent on transferring data between the engines.As shown in the figure, the data transfer time almostequals the query execution time. Approximately 500MBof data are copied for each batch while 160MB are accessedby each query. However, as we increase the batch size, thecost for copying the data gets amortized. The OLTP through-put remains unaffected due to the physical isolation of theengines at the socket boundary. The interference at the mem-ory bus when the RDE engine copies the data to the OLAPsocket does not affect the execution of transactions, since theOLTP engine does not fully utilize the memory bandwidth.
Insight [ S ] . Copying data from one engine to the othergets amortized after some time, provided that several queriesaccess the same data. Moreover, by periodically copying datato the OLAP engine, the system limits the stress on the OLTPmemory bus, which is affected both when engines share theirsocket and when the OLAP engine reads the fresh data that itneeds from the OLTP socket. Therefore, S brings the systeminto a steady state and, for this reason, the scheduler invokesit when the amount of fresh data becomes large enough. Hybrid OLTP and OLAP [ S ] . In this state, we focus onthe impact of assigning OLTP CPU cores to the OLAP enginefor the non-isolated case, which we refer to as S − N I andof remote reads for the isolated case, which we refer to as S − IS . Note that in S − IS we cannot store the data that webring, because this can lead to an inconsistent snapshot onthe OLAP side if some records are left behind because theyare not needed by the current query. Upon state migration, the OLTP engine switches instanceand synchronizes the two instances. At that point, the sched-uler knows how many fresh records exist in the system fromthe RDE engine. We use this information to optimize theaccess method of the OLAP engine enabling it to access fromthe OLTP engine only the data that are explicitly inserted.We use this split-access optimization only when the queryaccesses tables where data are inserted and not updated , asthe latter would lead to accessing an inconsistent snapshot.For S − N I , we vary the amount of CPUs that are passedfrom the OLTP to the OLAP engine. We use Q1 for thiscase because Q6 introduces overheads that we explain in thefollowing section and does not improve OLAP performance.We use the split-access method when accessing the data,since this improves the execution time. We report the resultsof our experiment in Figure 3(c) which has layout similarto Figure 3(a). As shown in the figure, the OLTP enginesuffers from hardware-level interference, similar to the co-located case where we executed Q6. Again, we observe thatthe performance of OLAP plateaus as we give it more that 6CPUs, since it can already saturate the memory bandwidth.The improvement that we are getting in the query responsetime is around 20% in this experiment, because most of thedata are accessed from the OLAP instance. Effectively, thisthreshold corresponds to the maximum number of CPUs thatthe database administrator would allow the OLAP engine toelastically expand, while sacrificing OLTP performance.For S − IS , we execute an experiment where we vary theamount fresh data that the OLAP engine will need to accessin order to achieve freshness-rate equal to 1. In Figure 4 wereport the average query response time of Q1 when varyingthe percentage of fresh data accessed by the query. Morespecifically, we vary the fresh bytes in columns touched bythe query and report in the x-axis the touched fresh bytes asa ratio over the total fresh bytes in the database.The ratio offresh data increases over time, as transactions are insertingnew data. There are two important observations to be madefrom this plot: (i) The full-remote method for S − IS is worsethan S , since it always has to bring the same data over theinterconnect. S is worse in the beginning because it needsto fetch more data for the first query, but as we execute thequery many times, it stabilizes; (ii) The split-access methodfor S is better because it accesses only the fresh data thatare required by the query. Moreover, we observe that theblue line of the split-access S approaches the grey line of S , and at some point they are expected to cross. This pointis when the scheduler prefers to do the data transfer beforeexecuting the query, because most of the data is fresh andthe full ETL cost will be amortized. Insight [ S ] . We have shown that an HTAP system bene-fits from transferring the data from the OLTP to the OLAP aza, et al. S e qu e n c e e x e c u t i o n t i m e ( s e c ) OLAP query sequenceS1S2S3-ISAdaptive-S3-ISS3-NIAdaptive-S3-NI (a) OLAP O L T P t h r o u g hpu t ( M T P S ) OLAP query sequenceS1 S2S3-IS Adaptive-S3-ISS3-NI Adaptive-S3-NI (b) OLTP
Figure 5: HTAP performance under different scheduling states side, after some freshness threshold. Moreover, hybrid exe-cution provides benefits to query execution, as long as thereare enough data to saturate the memory bandwidth through-out the whole query execution. Finally, we have shown theamount of resources to be reallocated between the enginesdepends on the queries and the available hardware.
OLTP tail latency.
As OLAP stresses the memory bus,the OLTP engine is expected to experience higher tail laten-cies. In S − IS and S , this effect is expected to be smaller, asthe OLAP accesses go through the CPU interconnect. How-ever, this becomes higher as system migrates to S − N I , andto S which is the worst case. In Figure 5, we evaluate the adaptive scheduling algorithmin HTAP and compare it with every individual state. Weinitialize the database at SF 30 before we synchronize thestorage of both engines, thereby setting the freshness-rateof the OLAP instance to be 1. The OLAP engine executesa set of 3 queries for 100 times in a sequence, one after theother and we report the total execution time of sequence,including any snapshotting or ETL. We use two scan-heavyqueries, Q1 & Q6, and one join-heavy query, Q19. The OLTPengine executes TPC-C
NewOrder , concurrently to the OLAPqueries. We remove the
LIKE condition from Q19, as it is notsupported by the OLAP engine. In the following, we firstdescribe the characteristics of every query and its interactionwith the hardware, and then we evaluate their performancein the query mix.
CH-Q6 (scan-filter-reduce) is memory intensive and hasperformance dependent on availability of total bandwidththat can be used to access data. In state S2, CH-Q6 executes at memory-bandwidth but pays an upfront cost of ETL oper-ation. In state S1 and S3-NI, availability of more data-localbandwidth gives CH-Q6 a performance boost. In state S1,CH-Q6 can starve OLTP of memory-bandwidth due to long-sequential memory-scans and in reverse, can have interfer-ence with OLTP engine, as both, DRAM and interconnectbandwidth are shared with concurrent OLTP, stressing theDMA controllers of all NUMA sockets.
CH-Q1 (scan-filter-groupby) is similar to Q6, but the fi-nal grouping and aggregation stress CPU-caches, and ifexecuted across socket, based on group distribution, mayget a performance hit by cache-coherency across NUMAnodes in a hash-based group-by operation. For less amountof fresh-data, CH-Q1 performs best with state S3-IS, consum-ing memory-bandwidth accumulative of interconnect band-width, followed by socket-local group-by. As the amount offresh data increases, state S3-NI dominates in performance,hiding cross-socket load balancing effects.
CH-Q19 (fact-dimension-join) joins a fact table with a di-mension table, and is dominated by memory latency of ran-dom accesses, during the probing phase of hash-based join.The OLAP engine opts for broadcast-based join for CH-Q19as the build side is relatively small ( tpcc_items with 100,000records) which penalizes cross-NUMA states, S1 and S3-NI.In addition to broadcast overhead, S1 faces additional interfer-ence with OLTP traffic across socket interconnect. However,with more ratio of fresh data, the broadcast cost is amortizedby data-local memory accesses in state S3-NI.
OLAP performance.
Figure 5(a) shows the OLAP se-quence execution times for different states. As the OLTPengine inserts data, execution time increases. The selection daptive HTAP through Elastic Resource Scheduling of state for S − IS , S , and S − N I depends on the perfor-mance requirements for the OLTP engine, from stricter tolooser ones, respectively. With the availability of elastic re-sources, the scheduler either opts for S − N I or S , dependingon the OLTP engine requirements. Observe that the designof our OLTP engine with two instances, synchronized uponquery arrival, inherently corresponds to the decision of ourscheduler by setting α =
0, in the S case. As this approachfor the OLTP engine provides flexibility to our design, weonly report the performance for the adaptive S .State S is the slowest one, as it has to do an ETL forevery query. Given that we are not executing batches, theETL cost only gets amortized with respect to S IS after 75query sets because at that point, in S the OLAP enginehas more socket-local data than in S − IS where data areread from the remote socket. State S − N I provides furtherperformance improvement due to presences of data-localcompute resources where OLAP reduces fresh-data oppor-tunistically and transfer less data over interconnect. In allcases, we observe that their adaptive counterpart providesbetter performance, at the cost of a single ETL where onequery pays with additional latency. The timing of this ETLdepends on the value of α , which we currently set to 0.5.Smaller values of α cause smaller tail latency, but at the costof smaller benefit for the rest of the queries. Finally, in allcases, we observe that the gap between the adaptive andthe non-adaptive case is widening. Across states, this gapgoes up to 50% ( Adaptive − S − N I and S − IS ) showing thebenefits to migrate from one state to an adaptive one. Wehave executed further experiments with up to 300 queriesof the same mix for the state S − N I which has the slowestconvergence, and we observed that this gap starts from 11%in the given sequence, goes to 22% for 200 queries, 25% for250 and 30% for 300 queries.
OLTP Performance.
Figure 5(b) shows the transactionalthroughput corresponding to OLAP query-sequence execu-tion, under different system scheduling states. OLTP through-put slightly degrades due to increased memory-pressure onOLTP-local DRAM as OLAP has hybrid-scans, that is, access-ing data over interconnect in additional to OLAP-local scan.In all the adaptive schedules, OLTP throughput increasesafter every ETL operation which reduces pressure on OLTP-local memory-bus by OLAP, which corroborates the aboveclaim. S3-NI has a lower throughput compared to isolatedcounterparts due to the reallocation of CPU cores to theOLAP engine. Finally, S has variance due to the co-locationof an OLTP and and an OLAP in the same sockets. Insights.
Freshness-driven scheduling in HTAP adaptsacross feasible states, where feasibility is set as the work-loads’ isolation level. Isolated adaptive mode achieves 30%speedup over S3-IS with 100 sequences. with 4-elastic cores,adaptive mode achieves 11%, 22% and 26% performance gains at 100th, 200th and 250th sequence execution, and with time,adaptive schedule amortizes cost of data movement. In gen-eral, adaptive case builds upon hybrid-states while stressingto trigger ETL based on the freshness ratio in order to balancedata-access across OLAP-local and fresh-data.
We look at HTAP as a scheduling problem, where the systembalances OLTP and OLAP engine performance, dependingon the data freshness and the performance requirementsof the workload. We define the HTAP design space rang-ing from fully co-located engines to fully isolated, and wedevise an elastic system design which traverses across thisspace by distributing computing and memory resources tothe OLTP and the OLAP engine. We provide a schedulingalgorithm which drives resource allocation decisions. Weperform a sensitivity analysis of our system, showing thatexchanging resources between the two engines is beneficialuntil a certain point. Finally, we evaluate the performanceof our system using the CH-Benchmark, and we show thatour system adapts to the data freshness and performancerequirements of the workload.
We would like to thank the reviewers and the shepherd fortheir valuable feedback. Angelos Anadiotis was at EPFL untilafter the initial submission of the paper, and contributed tothe revision while at the Ecole Polytechnique. This work waspartially funded by the FNS project "Efficient Real-time Ana-lytics on General-Purpose GPUs" subside no. 200021_178894/1and the EU H2020 project SmartDataLake (825041).
REFERENCES
Foundations and Trends inDatabases
5, 3 (2013), 197–280. https://doi.org/10.1561/1900000024[3] Anastassia Ailamaki, David J. DeWitt, Mark D. Hill, and Marios Sk-ounakis. 2001. Weaving Relations for Cache Performance. In
VLDB2001, Proceedings of 27th International Conference on Very Large DataBases, September 11-14, 2001, Roma, Italy
PVLDB
11, 2 (2017), 121–134. https://doi.org/10.14778/3149193.3149194[5] Raja Appuswamy, Manos Karpathiotakis, Danica Porobic, and Anas-tasia Ailamaki. 2017. The Case For Heterogeneous HTAP. In
CIDR2017, 8th Biennial Conference on Innovative Data Systems Research,Chaminade, CA, USA, January 8-11, 2017, Online Proceedings . http://cidrdb.org/cidr2017/papers/p21-appuswamy-cidr17.pdf15 aza, et al. [6] Manos Athanassoulis, Kenneth S. Bøgh, and Stratos Idreos. 2019. Opti-mal Column Layout for Hybrid Workloads.
PVLDB
Proceedings of the 1995 ACM SIGMOD International Confer-ence on Management of Data, San Jose, California, USA, May 22-25,1995 , Michael J. Carey and Donovan A. Schneider (Eds.). ACM Press,1–10. https://doi.org/10.1145/223784.223785[8] Mokrane Bouzeghoub and Verónika Peralta. 2004. A Framework forAnalysis of Data Freshness. In
IQIS 2004, International Workshop onInformation Quality in Information Systems, 18 June 2004, Paris, France(SIGMOD 2004 Workshop) , Felix Naumann and Monica Scannapieco(Eds.). ACM, 59–67. https://doi.org/10.1145/1012453.1012464[9] Tuan Cao, Marcos Antonio Vaz Salles, Benjamin Sowell, Yao Yue, Alan J.Demers, Johannes Gehrke, and Walker M. White. 2011. Fast check-point recovery algorithms for frequently consistent applications. In
Proceedings of the ACM SIGMOD International Conference on Manage-ment of Data, SIGMOD 2011, Athens, Greece, June 12-16, 2011 . 265–276.https://doi.org/10.1145/1989323.1989352[10] Biswapesh Chattopadhyay, Priyam Dutta, Weiran Liu, Ott Tinn, An-drew McCormick, Aniket Mokashi, Paul Harvey, Hector Gonzalez,David Lomax, Sagar Mittal, Roee Ebenstein, Nikita Mikhaylin, Hung-Ching Lee, Xiaoyan Zhao, Tony Xu, Luis Perez, Farhad Shahmoham-madi, Tran Bui, Neil Mckay, Selcuk Aya, Vera Lychagina, and BrettElliott. 2019. Procella: Unifying serving and analytical data at YouTube.
PVLDB
PVLDB
Euro-pean Conference on Computer Systems, Proceedings of the Seventh Eu-roSys Conference 2012, EuroSys ’12, Bern, Switzerland, April 10-13, 2012 ,Pascal Felber, Frank Bellosa, and Herbert Bos (Eds.). ACM, 169–182.https://doi.org/10.1145/2168836.2168854[13] Richard L. Cole, Florian Funke, Leo Giakoumakis, Wey Guy, AlfonsKemper, Stefan Krompass, Harumi A. Kuno, Raghunath OthayothNambiar, Thomas Neumann, Meikel Poess, Kai-Uwe Sattler, MichaelSeibold, Eric Simon, and Florian Waas. 2011. The mixed workloadCH-benCHmark. In
Proceedings of the Fourth International Workshopon Testing Database Systems, DBTest 2011, Athens, Greece, June 13, 2011 ,Goetz Graefe and Kenneth Salem (Eds.). ACM, 8. https://doi.org/10.1145/1988842.1988850[14] Sudipto Das, Feng Li, Vivek R. Narasayya, and Arnd Christian König.2016. Automated Demand-driven Resource Scaling in RelationalDatabase-as-a-Service. In
Proceedings of the 2016 International Confer-ence on Management of Data, SIGMOD Conference 2016, San Francisco,CA, USA, June 26 - July 01, 2016 , Fatma Özcan, Georgia Koutrika, andSam Madden (Eds.). ACM, 1923–1934. https://doi.org/10.1145/2882903.2903733[15] Pamela Delgado, Florin Dinu, Anne-Marie Kermarrec, and WillyZwaenepoel. 2015. Hawk: Hybrid Datacenter Scheduling. In
Proceedings of the 2018International Conference on Management of Data, SIGMOD Conference2018, Houston, TX, USA, June 10-15, 2018 , Gautam Das, Christopher M.Jermaine, and Philip A. Bernstein (Eds.). ACM, 177–190. https://doi.org/10.1145/3183713.3190660[17] Franz Färber, Norman May, Wolfgang Lehner, Philipp Große, IngoMüller, Hannes Rauhe, and Jonathan Dees. 2012. The SAP HANADatabase – An Architecture Overview.
IEEE Data Eng. Bull.
35, 1(2012), 28–33. http://sites.computer.org/debull/A12mar/hana.pdf[18] Benjamin Farley, Ari Juels, Venkatanathan Varadarajan, Thomas Ris-tenpart, Kevin D. Bowers, and Michael M. Swift. 2012. More for yourmoney: exploiting performance heterogeneity in public clouds. In
ACMSymposium on Cloud Computing, SOCC ’12, San Jose, CA, USA, October14-17, 2012 . 20. https://doi.org/10.1145/2391229.2391249[19] Robert Grandl, Srikanth Kandula, Sriram Rao, Aditya Akella, and Ja-nardhan Kulkarni. 2016. GRAPHENE: Packing and Dependency-AwareScheduling for Data-Parallel Clusters. In
PVLDB
9, 12 (2016), 972–983. https://doi.org/10.14778/2994509.2994516[22] Alfons Kemper and Thomas Neumann. 2011. HyPer: A hybridOLTP&OLAP main memory database system based on virtual memorysnapshots. In
Proceedings of the 27th International Conference on DataEngineering, ICDE 2011, April 11-16, 2011, Hannover, Germany . 195–206.https://doi.org/10.1109/ICDE.2011.5767867[23] Tirthankar Lahiri, Shasank Chavan, Maria Colgan, Dinesh Das, AmitGanesh, Mike Gleeson, Sanket Hase, Allison Holloway, Jesse Kamp,Teck-Hua Lee, Juan Loaiza, Neil MacNaughton, Vineet Marwah, NiloyMukherjee, Atrayee Mullick, Sujatha Muthulingam, VivekanandhanRaja, Marty Roth, Ekrem Soylemez, and Mohamed Zaït. 2015. OracleDatabase In-Memory: A dual format in-memory database. In . 1253–1258. https://doi.org/10.1109/ICDE.2015.7113373[24] Per-Åke Larson, Adrian Birka, Eric N. Hanson, Weiyun Huang, MichalNowakiewicz, and Vassilis Papadimos. 2015. Real-Time AnalyticalProcessing with SQL Server.
PVLDB
8, 12 (2015), 1740–1751. https://doi.org/10.14778/2824032.2824071[25] David Lo, Liqun Cheng, Rama Govindaraju, Parthasarathy Ran-ganathan, and Christos Kozyrakis. 2015. Heracles: improving resourceefficiency at scale. In
Proceedings of the 42nd Annual InternationalSymposium on Computer Architecture, Portland, OR, USA, June 13-17,2015 , Deborah T. Marr and David H. Albonesi (Eds.). ACM, 450–462.https://doi.org/10.1145/2749469.2749475[26] Kshiteej Mahajan, Mosharaf Chowdhury, Aditya Akella, and ShuchiChawla. 2018. Dynamic Query Re-Planning using QOOP. In daptive HTAP through Elastic Resource Scheduling [27] Darko Makreshanski, Jana Giceva, Claude Barthels, and GustavoAlonso. 2017. BatchDB: Efficient Isolated Execution of HybridOLTP+OLAP Workloads for Interactive Applications. In
Proceedingsof the 2017 ACM International Conference on Management of Data,SIGMOD Conference 2017, Chicago, IL, USA, May 14-19, 2017 . 37–50.https://doi.org/10.1145/3035918.3035959[28] Henrik Mühe, Alfons Kemper, and Thomas Neumann. 2011. Howto efficiently snapshot transactional data: hardware or software con-trolled?. In
Proceedings of the Seventh International Workshop on DataManagement on New Hardware, DaMoN 2011, Athens, Greece, June 13,2011 . 17–26. https://doi.org/10.1145/1995441.1995444[29] Vivek R. Narasayya, Sudipto Das, Manoj Syamala, Badrish Chan-dramouli, and Surajit Chaudhuri. 2013. SQLVM: Performance Iso-lation in Multi-Tenant Relational Database-as-a-Service. In
CIDR 2013,Sixth Biennial Conference on Innovative Data Systems Research, Asilo-mar, CA, USA, January 6-9, 2013, Online Proceedings
CIDR 2013, SixthBiennial Conference on Innovative Data Systems Research, Asilomar, CA,USA, January 6-9, 2013, Online Proceedings .[31] Thomas Neumann. 2011. Efficiently Compiling Efficient Query Plansfor Modern Hardware.
PVLDB
4, 9 (2011), 539–550. https://doi.org/10.14778/2002938.2002940[32] Thomas Neumann, Tobias Mühlbauer, and Alfons Kemper. 2015. FastSerializable Multi-Version Concurrency Control for Main-MemoryDatabase Systems. In
Proceedings of the 2015 ACM SIGMOD Interna-tional Conference on Management of Data, Melbourne, Victoria, Aus-tralia, May 31 - June 4, 2015 . 677–689. https://doi.org/10.1145/2723372. 2749436[33] Rasmus Pagh and Flemming Friche Rodler. 2004. Cuckoo hashing.
J.Algorithms
51, 2 (2004), 122–144. https://doi.org/10.1016/j.jalgor.2003.12.002[34] Hasso Plattner. 2009. A common database approach for OLTP andOLAP using an in-memory column database. In
Proceedings of theACM SIGMOD International Conference on Management of Data, SIG-MOD 2009, Providence, Rhode Island, USA, June 29 - July 2, 2009 , UgurÇetintemel, Stanley B. Zdonik, Donald Kossmann, and Nesime Tatbul(Eds.). ACM, 1–2. https://doi.org/10.1145/1559845.1559846[35] Iraklis Psaroudakis, Florian Wolf, Norman May, Thomas Neumann,Alexander Böhm, Anastasia Ailamaki, and Kai-Uwe Sattler. 2014. Scal-ing Up Mixed Workloads: A Battle of Data Freshness, Flexibility, andScheduling. In
Performance Characterization and Benchmarking. Tra-ditional to Big Data - 6th TPC Technology Conference, TPCTC 2014,Hangzhou, China, September 1-5, 2014. Revised Selected Papers . 97–112.https://doi.org/10.1007/978-3-319-15350-6_7[36] Vijayshankar Raman, Gopi K. Attaluri, Ronald Barber, NareshChainani, David Kalmuk, Vincent KulandaiSamy, Jens Leenstra, SamLightstone, Shaorong Liu, Guy M. Lohman, Tim Malkemus, RenéMüller, Ippokratis Pandis, Berni Schiefer, David Sharpe, Richard Sidle,Adam J. Storm, and Liping Zhang. 2013. DB2 with BLU Acceleration: SoMuch More than Just a Column Store.
PVLDB
6, 11 (2013), 1080–1091.https://doi.org/10.14778/2536222.2536233[37] Yingjun Wu, Joy Arulraj, Jiexi Lin, Ran Xian, and Andrew Pavlo. 2017.An Empirical Evaluation of In-Memory Multi-Version ConcurrencyControl.