Buffer Pool Aware Query Scheduling via Deep Reinforcement Learning
BBuffer Pool Aware Query Schedulingvia Deep Reinforcement Learning(Extended Abstract)
Chi Zhang , Ryan Marcus , Anat Kleiman , Olga Papaemmanouil Brandeis University MIT CSAIL and Intel Labs { chizhang, akleiman, opapaemm } @brandeis.edu, [email protected] ABSTRACT
In this extended abstract, we propose a new technique forquery scheduling with the explicit goal of reducing disk readsand thus implicitly increasing query performance. We intro-duce SmartQueue, a learned scheduler that leverages over-lapping data reads among incoming queries and learns ascheduling strategy that improves cache hits. SmartQueuerelies on deep reinforcement learning to produce workload-specific scheduling strategies that focus on long-term perfor-mance benefits while being adaptive to previously-unseendata access patterns. We present results from a proof-of-concept prototype, demonstrating that learned schedulerscan offer significant performance improvements over hand-crafted scheduling heuristics. Ultimately, we make the casethat this is a promising research direction in the intersectionof machine learning and databases.
1. INTRODUCTION
Query scheduling, the problem of deciding which of a setof queued queries to execute next, is an important and chal-lenging task in modern database systems. Query schedulingcan have a significant impact on query performance and re-source utilization while it may need to account for a widenumber of considerations, such as cached data sets, avail-able resources (e.g., memory), per-query performance goals,query prioritization, or inter-query dependencies (e.g., cor-related data access patterns).In this work, we attempt to address the query schedul-ing problem by leveraging overlapping data access requests.Smart query scheduling policies can take advantage of suchoverlaps, allowing queries to share cached data, whereasnaive scheduling policies may induce unnecessary disk reads.For example, consider three queries q , q , q which need toread disk blocks ( b , b ), ( b , b ), and ( b , b ) respectively. Ifthe DBMS’s buffer pool (i.e., the component of the databaseengine that caches data blocks) can only cache two blocks atonce, executing the queries in the order of [ q , q , q ] will re-sult in reading 6 blocks from disk. However, if the queries areexecuting in the order [ q , q , q ], then only 5 blocks will beread from disk, as q will use the cached b . Since buffer poolhits can be orders of magnitude faster than cache misses,such savings could be substantial.In reality, designing a query scheduler that is aware ofthe current buffer pool is a complex task. First, the ex-act data block read set of a query is not known ahead oftime, and is dependent on data and query plan parame-ters (e.g., index lookups). Second, a smart scheduler mustbalance short-term rewards (e.g., executing a query that will take advantage of the current buffer state) against long-termstrategy (e.g., selecting queries that keep the most importantblocks cached). One could imagine many simple heuristics,such as greedily selecting the next query with the highestexpected buffer usage, to solve this problem. However, ahand-designed policy to handle the complexity of the en-tire problem, including different buffer sizes, shifting queryworkloads, heterogeneous data types (e.g., index files vs baserelations), and balancing short-term gains against long-termstrategy is much more difficult to conceive.Here, we showcase a prototype of SmartQueue, a deepreinforcement learning (DRL) system that automaticallylearns to maximize buffer hits in an adaptive fashion. Givena set of queued queries, SmartQueue combines a simplerepresentation of the database’s buffer state, the expectedreads of queries, and deep Q-learning model to order queuedqueries in a way that garners long-term increases in bufferhits. SmartQueue is fully learned, and requires minimal tun-ing. SmartQueue custom-tailors itself to the user’s queriesand database, and learns policies that are significantly bet-ter than naive or simple heuristics. In terms of integratingSmartQueue into an existing DBMS, our prototype only re-quires access to the execution plan for each incoming query(to assess likely reads) and the current state of the DBMSbuffer pool (i.e., its cached data blocks).We present our system model and formalized our learningtask in Section 2. We present preliminary experimental re-sults from a proof-of-concept prototype implementation inSection 3, related work in Section 4, and in Section 5 wehighlight directions for future work.
2. THE SMARTQUEUE MODEL
SmartQueue is a learned query scheduler that automati-cally learns how to order the execution of queries to mini-mize disk access requests. The core of SmartQueue includesa deep reinforcement learning (DRL) agent [3] that learnsa query scheduling policy through continuous interactionswith its environment, i.e., the database and the incomingqueries. This DRL agent is not a static model, instead it continuously learns from its past scheduling decisions and adapts to new data access and caching patterns. Further-more, as we discuss below, using a DRL model allows us todefine a reward function and scheduling policy that captureslong-term benefits vs short-term gains in disk access.Our system model is depicted in Figure 1. Incoming userqueries are placed into an execution queue and SmartQueuedecides their order of execution. For each query execution,1 a r X i v : . [ c s . D B ] J u l he database collects the required data blocks of each in-put base relation, where a data block is the smallest dataunit used by the database engine. Data blocks requestsare first resolved by the buffer pool. Blocks found in thebuffer ( buffer hits ) are returned for processing while therest of the blocks ( buffer misses ) are read from disk andplaced into the buffer pool (after possible block evictions).Higher buffer hit rates (and hence lower disk access rates)can enormously impact query execution times but requirestrategic query scheduling, as execution ordering affects thedata blocks cached in the buffer pool.One tempting solution to address this challenge could in-volve a greedy scheduler which executes the query that willre-use the maximum number of cached data blocks. Whilethis simple approach would yield short term benefits, it ig-nores the long-term impact of each choice. Specifically, whilethe next query for execution will maximally utilize the bufferpool contents, it will also lead to newly cached data blocks,which will affect future queries. A greedy approach fails toidentify whether these new cached blocks could be of anybenefit to the unscheduled yet queries.SmartQueue addresses this problem by training a deepreinforcement learning agent to make scheduling decisionsthat maximize long term benefits. Specifically, it uses amodel that simultaneously estimates and tries to improvea weighted average between short-term buffer hits and thelong-term impact of query scheduling choices. In the nextparagraphs, we discuss the details of our approach: (a)the input features vector that capture data access requests( Query Bitmap ) and buffer state (
Buffer Bitmap ), and (b)the formalized DRL task.
Buffer Bitmap.
One input to the DRL model is the state ofthe buffer pool, namely which blocks are currently cached inmemory. Buffer state B is represented by a bitmap whererows represent base relations and columns represent datablocks. The ( i, j ) entry is set to 1 if the j -th block of relation i is cached in the buffer pool and is set to zero otherwise.Since the number of blocks of any given relation can be veryhigh and different for each relation, each row vector F i isdownsized by calculating a simple moving average over thenumber of its blocks entries. Specifically D i is the downsizedrow of a relation i and F i is the full size row, we have: B ij = (cid:98)| F i | / | D i |(cid:99) × ( j +1) ×(cid:98)| D i | / | F i |(cid:99) (cid:88) k = j ×(cid:98)| D i | / | F i |(cid:99) F ik (1) Query Vector.
The second input to the DLR model is thedata block requests of each query in the queue. Specifi-cally, given a query q , we generate a vector that indicatesthe data blocks to be accessed by q for each base relationin the database. To implement this, SmartQueue collectsthe query plan of q , and approximates the probability ofeach table’s data block being accessed. Our approach han-dle requests of index file and base relations similarly, as bothtype of blocks will be cached into the buffer pool. The queryvector is downsized in the same was as the buffer bitmap.Full table scans for a base relation i indicate that all datablocks of the given relation will be accessed, and thereforeeach cell of the i -th row vector has the value of 1. Forindexed table scans, we calculate the number of tuples tobe accessed based on the selectivity of the index scan. If Q n … Q Q Query QueueQuery State (blockrequests)
Buffer State (cached blocks)
Action (Q i to execute)Buffer Pool b b b Q value for Q Q value for Q Q value for Q n … Action SelectionQ-Learning DNN
StateLayer Hidden Layers Action Layer
Reward (buffer hit ratio)
DB Engine
Figure 1: SmartQueue’s system modelthe index scan is feeding a loop-based operator (i.e., nestedloop join) the selectivity is adapted accordingly to accountfor any iterations over the relation. We assume the relationis uniformly stored across data blocks and therefore, if x %tuples of a base relation are to be selected from an indexedoperation, we set the access probability of each data blockof the relation to x %. Similarly, we assume that the indexedoperation reads x % of the index’s blocks. We note that muchmore sophisticated probabilistic models could be used, butfor this preliminary work we use this simple approximation. Deep Q-Learning.
SmartQueue uses deep Q-learning [17]in order to decide which query to execute next. As with anydeep reinforcement learning system, SmartQueue is an agentthat operates over a set of states S (buffer pool states) anda set of actions A per state (candidate queries to executednext). SmartQueue models the problem of query schedul-ing as a Markov Decision Process (MDP) [38]: by pickingone query from the queue to execute, the agent transitionsfrom the current to a new buffer pool state (i.e., data blockscached). Executing a new query on the current buffer state,provide the agent with a reward. In our case, the rewardof an action is the buffer hit ratio of the executed querycalculated as buffer hitstotal block requests .The goal of the agent is to learn a scheduling policy thatmaximizes its total reward. This is an continues learningprocess: as more queries arrive and the agent makes morescheduling decisions, it collects more information (i.e., con-text of the decision and its reward) and adapts its policyaccordingly. The scheduling policy is expressed as a func-tion Q ( S t , A t ), that outputs a Q-value for taking an action A t (i.e., a query to execute next) on a buffer state S t . Givena state S t and a action A t , the Q-value Q ( S t , A t ) is calcu-lated by adding the maximum reward attainable from futurebuffer states to the reward for achieving its current bufferstate, effectively influencing the current scheduling decisionby the potential future reward. This potential reward is aweighted sum of the expected buffer hit ratios of all futurescheduling decisions starting from the current buffer state.Formally, after each action A t on a state S t the agent learnsa new policy Q new ( S t , A t ) defined as: Q ( S t , A t ) + α [ R t + γ max α ( Q ( S t +1 , α ) − Q ( S t , A t ))] (2)The parameter γ is the discount factor which weighs thecontribution of short-term vs. long-term rewards. Adjusting2he value of γ will diminish (e.g., favor choosing queries thatwill make use of the current buffer state) or increase (e.g.,favor choosing queries that will allow long-term increasedusage of the buffer) the contribution of future rewards. Theparameter α is the learning rate or step size. This sim-ply determines to what extent newly acquired informationoverrides old information: a low learning rate implies thatnew information should be treated skeptically, and may beappropriate when a workload is mostly stable but containssome outliers. A high learning rate implies that new in-formation is more fully trusted, and may be appropriatewhen query workloads smoothly change over time. Sincethe above is a recursive equation, it starts with making ar-bitrary assumptions for all Q -values (and hence arbitraryinitial scheduling decisions). However, as more experienceis collected through the execution of incoming queries, thenetwork likely converges to the optimal policy [27].
3. PRELIMINARY RESULTS
Here, we present preliminary experiments demonstratingthat SmartQueue can generate query ordering that increasethe buffer hit ratio and improve query execution times com-pared with alternative non-learned schedulers.
Experimental Setup.
Our experimental study used work-loads generated using the 99 query templates of the TPC-DS benchmark [29]. We deployed a database with a size of49GB on single node server with 4 cores, 32GB of RAM.For our experiments, we generated 1 ,
000 random query in-stances out of these 99 templates and placed them in a ran-dom order in the execution queue. The benchmark includes165 tables and indexes, and the number of blocks for eachof these ranged between 100 and 130 , × , For each query, wecollect its query plan without executing the query by usingthe
EXPLAIN command.SmartQueue uses a fully-connected neural network. OurDRL agent was implemented with Keras [12] and uses 2hidden layers with 128 nerons each. We also use an adaptivelearning rate optimization algorithm (Adam [13]) and ourloss function is the mean squared error.In our study, we compare SmartQueue with two al-ternative scheduling approaches.
First-Come-First-Served(FCFS) simply executes queries in the order they appear inthe queue.
Greedy employs a simple heuristic to identify thequery with the best expected hit ratio given the current con-tents of the buffer pool. Specifically, for each queued queryit calculates the dot product of the buffer state bitmap withthe data requests bitmap, estimating essentially the prob-ability of buffer hits for each data block request. We thenorder all queries based on the sum of these probabilities overall blocks and execute the query with the highest sum value.Following the execution, the new buffer state is calculatedand the heuristic is applied again until the queue is empty.This greedy approach focuses on short-terms buffer hits im-provements. We configured PostgreSQL to bypass the OS filesystemcache. In future work, multiple levels of caching should beconsidered.
Effectiveness . First, we demonstrate that SmartQueuecan improve its effectiveness as it collects more experience.In this set of experiments, we placed all 1 ,
000 queries in thequeue and we start scheduling them using SmartQueue. Inthe beginning our agent will make arbitrary scheduling deci-sions, but as it schedules more queries, SmartQueue collectsmore experience from its past actions and starts improvingits policy. To demonstrate that, we evaluated the learnedmodel at different stages of its training. Figure 2a and Fig-ure 2b shows how the model performs as we increase thenumber of training queries. In Figure 2a, we measure theaverage buffer hit ratio when scheduling our 1 ,
000 queriesand we compare it with the buffer hit ratio of FCFS andGreedy (which is not affected by the number of trainingqueries). We observe that the DRL agent is able to improvethe buffer hit ratio as it schedules more queries. It outper-forms the buffer hit of the other two heuristics eventuallyconverging into a ration that is 65% higher than FCFS and35% higher than Greedy.In addition, Figure 2b shows the number of executedqueries over time. The results demonstrate that DRL-guided scheduling of SmartQueue allows our approach toexecute the workload of 1 ,
000 queries around 42% fasterthan Greedy and 55% faster than FCFS. This indicates thatSmartQueue can effectively capture the relationship betweenbuffer pool state and data access patterns, and leveragethat to better utilize the buffer pool and improve its queryscheduling decisions.
Adaptability to new queries . Next we studiesSmartQueue’s ability to adapt to unseen queries. Forthese experiments, we trained SmartQueue by first schedul-ing 950 random queries out of 79 TPC-DS templates. Wethen test the model over 50 random queries out 20 unseenbefore TPC-DS templates. Figure 3a demonstrates howaverage buffer hit ratio of the testing queries is affected asSmartQueue collects experience increases from schedulingmore training queries. The graph shows that the averagebuffer hit ratio of the testing queries is increased from0.2 (when the SmartQueue is untrained) to 0.64 (whenSmartQueue has schedule all 950 queries). Furthermore,SmartQueue outperforms FCFS and Greedy after havingscheduled less than 500 queries.Finally, Figure 3b, shows that the query latency ofour testing queries keeps decreasing (and eventually out-performs FCFS and Greedy) as SmartQueue is trained onmore queries. Our approach enables unseen queries to beeventually executed 11% faster than FCFS and 22% thanGreedy. These results indicate that query scheduling pol-icy can adapt to new query templates leading to significantperformance and resource sharing improvements.
Overhead . We also measured the training and inferencetime. Our proof-of-concept prototype needed 240 mins toincorporate 950 queries in our agent (so in average the train-ing overhead is 3 .
95 mins per query). This time does notinclude the execution time of the query. This training over-head can potentially be optimized by offloading it into an-other thread, introducing early stopping, or re-using previ-ous network weights to get a good ”starting point.” There isno training overhead for FCFS and Greedy. The inferencetime of SmartQueue is 3.12seconds while the inference timefor Greedy is 2.52 seconds and 0.0012 seconds for FCFS.3
00 200 300 400 500 600 700 800 900 1000Number of Queries0.00.20.40.60.81.0 A v e r a g e B u ff e r H i t R a t i o SmartQueueFCFSGreedy (a) Average buffer hit ratio
10 20 30 40 50 60 70Time (Hours)02004006008001000 N u m b e r o f Q u e r i e s SmartFCFSGreedy (b) Query execution time
Figure 2: SmartQueue’s effectiveness (buffer hit ratio and query completion rate) with increasing training sets.
100 200 300 400 500 600 700 800 900Number of Queries0.00.20.40.60.81.0 A v e r a g e B u ff e r H i t R a t i o SmartQueueFCFSGreedy (a) Average buffer hit ratio
100 200 300 400 500 600 700 800 900Number of Queries050100150200250 Q u e r y L a t e n c y ( M i nu t e s ) SmartQueueFCFSGreedy (b) Query execution time
Figure 3: Buffer hit ratio and latency improvement on unseen query templates and increasing training queries.
4. RELATED WORK
Prior work on query scheduling have focused on query par-allelism [40], elastic cloud databases [4, 9, 18, 19, 22, 24, 30],meeting SLAs [6, 7, 15, 25, 33, 34, 43, 44], or cluster schedul-ing [20, 36, 39]. In terms of buffer pools and caching, mostprior work has focused on smart cache management [2, 10](i.e., assuming the query order is fixed and choose whichblocks to evict or replace), or on (memory) cache-aware al-gorithms [46]. Here, we take a flipped approach, in whichwe assume the buffer management policy is fixed and thequery order may be modified (e.g., batch processing). Morebroadly, work on learned indexes follows recent trends inintegrating machine learning components into systems [11],especially database systems. Machine learning techniqueshave also been applied to query optimization [23, 32, 41],cardinality estimation [14, 31, 45], cost modeling [37], dataintegration [8, 28], tuning [42], and security [35].
5. CONCLUSION AND FUTURE WORK
We have presented SmartQueue, a deep reinforcementlearning query scheduler that seeks to maximize buffer hitrates in database management systems. While simple,SmartQueue was able to provide substantial improvementsover naive and simple heuristics, suggesting that cache-aware deep learning powered query schedulers are a promis-ing research direction. SmartQueue is only an early pro-totype, and in the future we plan to conduct a full exper-imental study of SmartQueue. In general, we believe thefollowing areas of future work are promising.
Neural network architecture.
While effective in our ini-tial experiments, a fully connected neural network is likely not the correct inductive bias [26] for this problem. Afully connected neural network is not likely to innatelycarry much useful information for query scheduling [21],nor is there much of an intuitive connection between afully-connected architecture and the query scheduling prob-lem [5]. The first layer of our network learns one linearcombination per neuron of the entire input. These linearcombinations would have to be extremely sparse to learnfeatures like ”the query reads this block, which is cached.”Other network architectures – like locally connected neuralnetworks [16] – may provide significant benefit.
SLAs.
Improving raw workload latency is helpful, but of-ten applications have much more complex performance re-quirements (e.g., some queries are more important than oth-ers). Integrating query priorities and customizable ServiceLevel Agreements (SLAs) into SmartQueue by modifyingthe reward signal could result in an buffer-aware and SLA-compliant scheduler.
Query optimization.
Different query plans may per-form differently with different buffer states. IntegratingSmartQueue into the query optimizer – so that query planscan be selected to maximize buffer usage – may provide sig-nificant performance gains.
Buffer management.
SmartQueue only considers queryordering, and assumes that the buffer management policy isopaque. A larger system could consider both query orderingand buffer management, choosing to evict or hold bufferedblocks based on future queries. Such a system could repre-sent an end-to-end query scheduling and buffer managementpolicy.4 . REFERENCES
VLDB ’03 .[3] K. Arulkumaran et al. A Brief Survey of DeepReinforcement Learning.
IEEE Signal Processing ’17 .[4] Y. Azar et al. Cloud scheduling with setup cost. In
SPAA ’13 .[5] P. W. Battaglia et al. Relational inductive biases,deep learning, and graph networks. arXiv ’18 .[6] Y. Chi et al. iCBS: Incremental Cost-based SchedulingUnder Piecewise Linear SLAs.
VLDB ’11 .[7] Y. Chi et al. SLA-tree: A Framework for EfficientlySupporting SLA-based Decisions in Cloud Computing.In
EDBT ’11 .[8] R. C. Fernandez et al. Termite: A System forTunneling Through Heterogeneous Data. In aiDM ’19 .[9] S. Genaud et al. Cost-wait trade-offs in client-sideresource provisioning with elastic clouds. In
CLOUD’11 .[10] S. Ghandeharizadeh et al. Cache augmented databasemanagement systems. In
DBSocial ’13 .[11] J. Gottschlich et al. The three pillars of machineprogramming. In
MAPL 2018 .[12] Keras: The Python Deep Learning library. https://keras.io// .[13] D. P. Kingma et al. Adam: A Method for StochasticOptimization. In
ICLR ’15 .[14] A. Kipf et al. Learned Cardinalities: EstimatingCorrelated Joins with Deep Learning. In
CIDR ’19 .[15] W. Lang et al. Towards Multi-Tenant PerformanceSLOs. In
ICDE ’14 .[16] Y. Lecun. Generalization and network designstrategies. In
Connectionism ’89 .[17] Y. LeCun et al. Deep learning.
Nature ’15 .[18] P. Leitner et al. Cost-Efficient and ApplicationSLA-Aware Client Side Request Scheduling in anInfrastructure-as-a-Service Cloud. In
CLOUD ’12 .[19] Z. Liu et al. PMAX: Tenant Placement in MultitenantDatabases for Profit Maximization. In
EDBT ’13 .[20] H. Mao et al. Learning Scheduling Algorithms forData Processing Clusters. arXiv ’18 .[21] G. Marcus. Innateness, AlphaZero, and ArtificialIntelligence. arXiv ’18 .[22] R. Marcus et al. NashDB: An End-to-End EconomicMethod for Elastic Database Fragmentation,Replication, and Provisioning. In
SIGMOD ’18 .[23] R. Marcus et al. Neo: A Learned Query Optimizer.
VLDB ’19 .[24] R. Marcus et al. Releasing Cloud Databases from theChains of Performance Prediction Models. In
CIDR’17 .[25] R. Marcus et al. WiSeDB: A Learning-based WorkloadManagement Advisor for Cloud Databases.
VLDB ’16 .[26] T. M. Mitchell.
The Need for Biases in LearningGeneralizations . PhD thesis.[27] V. Mnih et al. Human-level control through deepreinforcement learning.
Nature ’15 .[28] S. Mudgal et al. Deep Learning for Entity Matching:A Design Space Exploration. In
SIGMOD ’18 . [29] R. O. Nambiar et al. The Making of TPC-DS. In
VLDB ’06 .[30] V. Narasayya et al. SQLVM: Performance Isolation inMulti-Tenant Relational Database-as-a-Service. In
CIDR ’13 .[31] P. Negi et al. Cost-Guided Cardinality Estimation:Focus Where it Matters. In
SMDB @ ICDE ’20 .[32] J. Ortiz et al. Learning State Representations forQuery Optimization with Deep ReinforcementLearning. In
DEEM ’18 .[33] J. Ortiz et al. PerfEnforce Demonstration: DataAnalytics with Performance Guarantees. In
SIGMOD’16 .[34] J. Ortiz et al. SLAOrchestrator: Reducing the Cost ofPerformance SLAs for Cloud Data Analytics. In
USENIX ’18 .[35] Shrainik Jain et al. Database-Agnostic WorkloadManagement. In
CIDR ’19 .[36] B. Sotomayor et al. Virtual InfrastructureManagement in Private and Hybrid Clouds.
IEEE IC’09 .[37] J. Sun et al. An end-to-end learning-based costestimator.
VLDB ’19 .[38] R. S. Sutton et al.
Introduction to ReinforcementLearning . 1998.[39] R. Taft et al. STeP: Scalable Tenant Placement forManaging Database-as-a-Service Deployments. In
SoCC ’16 .[40] S. Tozer et al. Q-Cop: Avoiding bad query mixes tominimize client timeouts under heavy loads. In
ICDE’10 .[41] I. Trummer et al. SkinnerDB: Regret-bounded QueryEvaluation via Reinforcement Learning.
VLDB ’18 .[42] D. Van Aken et al. Automatic Database ManagementSystem Tuning Through Large-scale MachineLearning. In
SIGMOD ’17 .[43] P. Xiong et al. ActiveSLA: A Profit-orientedAdmission Control Framework forDatabase-as-a-Service Providers. In
SoCC ’11 .[44] P. Xiong et al. Intelligent management of virtualizedresources for database systems in cloud environment.In
ICDE ’11 .[45] Z. Yang et al. Deep unsupervised cardinalityestimation.
VLDB ’19 .[46] J. Zhou et al. Buffering databse operations forenhanced instruction cache performance. In