Archive | 2019

Workload-Driven and Robust Selection of Compression Schemes for Column Stores

 
 

Abstract


Modern main memory-optimized column stores employ a variety of compression techniques. Deciding for one compression technique over others for a given memory budget can be challenging since each technique has different trade-offs whose impact on large workloads is not obvious. We present an automated selection framework for compression configurations. Most database systems provide means to automatically choose a compression configuration but lack two crucial properties: The compression selection cannot be constrained (e.g., by a given storage budget) and robustness of the compression configuration is not considered. Our approach uses workload information to determine robust configurations under the given constraints. The runtime performance of the various compression techniques is estimated using adapted regression models. 1 COLUMN COMPRESSION IN HYRISE Two of the main driving forces of current database development – both industrial and research – are autonomous database systems and cloud-based installations. Both topics are strongly connected as database vendors are increasingly interested in optimizing their operational costs for large self-hosted database installations. One way to lower the costs – especially for main memoryoptimized database systems – is to reduce the memory consumption of large databases. Such a reduction allows storing databases on smaller and thus less expensive server machines or adding more instances to a shared server. But the sheer size of large cloud installations hampers manual optimization of compression configurations by database administrators. This development has recently sparked the research on autonomous database systems. The work presented in this paper is an intermediate step to approach the issue of optimizing memory consumption while still retaining the performance advantages of main memoryoptimized databases. When cost considerations are gaining importance, the optimization objective for compression configurations is less runtime performance rather than to retain the current runtime performance while minimizing the storage requirements. With the goal of automatically finding a compression configuration for a given memory budget, this project intends to provide the building blocks for autonomous systems. The area of data compression has been thoroughly studied for decades in database research. Virtually all modern database systems implement various techniques to compress data and most commercial systems further provide means to adjust the compression level (e.g., Oracle’s declarative policies for the automatic data compression (ADO), cf. [12], or SQLServer’s database © 2019 Copyright held by the owner/author(s). Published in Proceedings of the 22nd International Conference on Extending Database Technology (EDBT), March 26-29, 2019, ISBN 978-3-89318-081-3 on OpenProceedings.org. Distribution of this paper is permitted under the terms of the Creative Commons license CC-by-nc-nd 4.0. engine tuning advisor (DTA), cf. [11]). However, we see two distinct issues that remain open from a research perspective: (i) workloadand constraint-based compression configurations and (ii) determination of configurations whose runtime performance is robust to changing workloads. We present and discuss the three main components in our research database Hyrise [10] with which we approach workloaddriven and robust compression configurations: • We introduce Hyrise’s compression framework which implements an efficient and maintainable interface for various column compression techniques (Section 2). • We present our runtime estimation, which predicts the performance of compression techniques (Section 3). • We discuss the applicability of existing approaches for the optimization of physical database designs and how they perform for the task of compression selection (Section 4). 2 COLUMN COMPRESSION FRAMEWORK Virtually every database management system for hybrid transactional and analytical processing (HTAP) employs a variety of compression schemes. Besides the advantage of reducing the main memory footprint, light-weight compression can even improve runtime performance, e.g., by reducing the memory traffic (cf. [2, 4]) or broadening applicability of vectorization (cf. [17]). But supporting a variety of compression schemes is challenging as it needs to balance maintainability and efficiency. Most existing approaches optimize either (i) for performance while hampering maintainability and increasing complexity or (ii) provide unified interfaces for improved maintainability which potentially introduces runtimes issues. 2.1 Hyrise’s Storage Concept Hyrise is a main memory-optimized database with a columnmajor storage format [10]. Each table in Hyrise is horizontally partitioned into n chunks with a predefined maximum size. Each attribute of a table is hence distributed over all chunks whereby a column in a chunk is referred to as a segment. Modifications (i.e., insertions or MVCC-enabled updates) are appended to the most recent mutable chunk. When this chunk reaches its size limit, the chunk is considered immutable and a new mutable chunk is created. Immutable chunks might be compressed asynchronously. Hyrise encodes and compresses segments independently. 2.2 Balancing Performance and

Volume None
Pages 674-677
DOI 10.5441/002/edbt.2019.84
Language English
Journal None

Full Text