Max Heimel
Technical University of Berlin
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Max Heimel.
very large data bases | 2013
Max Heimel; Michael Saecker; Holger Pirk; Stefan Manegold; Volker Markl
The multi-core architectures of todays computer systems make parallelism a necessity for performance critical applications. Writing such applications in a generic, hardware-oblivious manner is a challenging problem: Current database systems thus rely on labor-intensive and error-prone manual tuning to exploit the full potential of modern parallel hardware architectures like multi-core CPUs and graphics cards. We propose an alternative design for a parallel database engine, based on a single set of hardware-oblivious operators, which are compiled down to the actual hardware at runtime. This design reduces the development overhead for parallel database engines, while achieving competitive performance to hand-tuned systems. We provide a proof-of-concept for this design by integrating operators written using the parallel programming framework OpenCL into the open-source database MonetDB. Following this approach, we achieve efficient, yet highly portable parallel code without the need for optimization by hand. We evaluated our implementation against MonetDB using TPC-H derived queries and observed a performance that rivals that of MonetDBs query execution on the CPU and surpasses it on the GPU. In addition, we show that the same set of operators runs nearly unchanged on a GPU, demonstrating the feasibility of our approach.
Trans. Large-Scale Data- and Knowledge-Centered Systems | 2014
Sebastian Breß; Max Heimel; Norbert Siegmund; Ladjel Bellatreche; Gunter Saake
The vast amount of processing power and memory bandwidth provided by modern graphics cards make them an interesting platform for data-intensive applications. Unsurprisingly, the database research community identified GPUs as effective co-processors for data processing several years ago. In the past years, there were many approaches to make use of GPUs at different levels of a database system. In this paper, we explore the design space of GPU-accelerated database management systems. Based on this survey, we present key properties, important trade-offs and typical challenges of GPU-aware database architectures, and identify major open challenges. Additionally, we survey existing GPU-accelerated DBMSs and classify their architectural properties. Then, we summarize typical optimizations implemented in GPU-accelerated DBMSs. Finally, we propose a reference architecture, indicating how GPU acceleration can be integrated in existing DBMSs.
data and knowledge engineering | 2014
Sebastian Breß; Norbert Siegmund; Max Heimel; Michael Saecker; Tobias Lauer; Ladjel Bellatreche; Gunter Saake
For a decade, the database community has been exploring graphics processing units and other co-processors to accelerate query processing. While the developed algorithms often outperform their CPU counterparts, it is not beneficial to keep processing devices idle while overutilizing others. Therefore, an approach is needed that efficiently distributes a workload on available (co-)processors while providing accurate performance estimates for the query optimizer. In this paper, we contribute heuristics that optimize query processing for response time and throughput simultaneously via inter-device parallelism. Our empirical evaluation reveals that the new approach achieves speedups up to 1.85 compared to state-of-the-art approaches while preserving accurate performance estimations. In a further series of experiments, we evaluate our approach on two new use cases: joining and sorting. Furthermore, we use a simulation to assess the performance of our approach for systems with multiple co-processors and derive some general rules that impact performance in those systems. Contribute heuristics to enhance performance by exploiting inter-device parallelismHeuristics consider load and speed on (co-)processors.Extensive evaluation on four use cases: aggregation, selection, sort, and joinAssess the performance of best heuristic for systems with multiple co-processorsDiscuss how operator-stream-based scheduling can be used in a query processor
very large data bases | 2014
Sebastian Breß; Bastian Köcher; Max Heimel; Volker Markl; Michael Saecker; Gunter Saake
The past years saw the emergence of highly heterogeneous server architectures that feature multiple accelerators in addition to the main processor. Efficiently exploiting these systems for data processing is a challenging research problem that comprises many facets, including how to find an optimal operator placement strategy, how to estimate runtime costs across different hardware architectures, and how to manage the code and maintenance blowup caused by having to support multiple architectures. In prior work, we already discussed solutions to some of these problems: First, we showed that specifying operators in a hardware-oblivious way can prevent code blowup while still maintaining competitive performance when supporting multiple architectures. Second, we presented learning cost functions and several heuristics to efficiently place operators across all available devices. In this demonstration, we provide further insights into this line of work by presenting our combined system Ocelot/HyPE. Our system integrates a hardware-oblivious data processing engine with a learning query optimizer for placement decisions, resulting in a highly adaptive DBMS that is specifically tailored towards heterogeneous hardware environments.
international conference on management of data | 2014
Tomas Karnagel; Matthias Hille; Mario Ludwig; Dirk Habich; Wolfgang Lehner; Max Heimel; Volker Markl
The increasing heterogeneity in hardware systems gives developers many opportunities to add more functionality and computational power to the system. As a consequence, modern database systems will need to be able to adapt to a wide variety of heterogeneous architectures. While porting single operators to accelerator architectures is well-understood, a more generic approach is needed for the whole database system. In prior work, we presented a generic hardware-oblivious database system, where the operators can be executed on the main processor as well as on a large number of accelerator architectures. However, to achieve fully heterogeneous query processing, placement decisions are needed for the database operators. We enhance the presented system with heterogeneity-aware operator placement (HOP) to take a major step towards designing a database system that can efficiently exploit highly heterogeneous hardware environments. In this demonstration, we are focusing on the placement-integration aspect as well as presenting the resulting database system.
international conference on management of data | 2015
Max Heimel; Martin Kiefer; Volker Markl
Quickly and accurately estimating the selectivity of multidimensional predicates is a vital part of a modern relational query optimizer. The state-of-the art in this field are multidimensional histograms, which offer good estimation quality but are complex to construct and hard to maintain. Kernel Density Estimation (KDE) is an interesting alternative that does not suffer from these problems. However, existing KDE-based selectivity estimators can hardly compete with the estimation quality of state-of-the art methods. In this paper, we substantially expand the state-of-the-art in KDE-based selectivity estimation by improving along three dimensions: First, we demonstrate how to numerically optimize a KDE model, leading to substantially improved estimates. Second, we develop methods to continuously adapt the estimator to changes in both the database and the query workload. Finally, we show how to drastically improve the performance by pushing computations onto a GPU. We provide an implementation of our estimator and experimentally evaluate it on a variety of datasets and workloads, demonstrating that it efficiently scales up to very large model sizes, adapts itself to database changes, and typically outperforms the estimation quality of both existing Kernel Density Estimators as well as state-of-the-art multidimensional histograms.
advances in databases and information systems | 2014
Sebastian Breß; Max Heimel; Norbert Siegmund; Ladjel Bellatreche; Gunter Saake
The vast amount of processing power and memory bandwidth provided by modern graphics cards make them an interesting platform for data-intensive applications. Unsurprisingly, the database research community has identified GPUs as effective co-processors for data processing several years ago. In the past years, there were many approaches to make use of GPUs at different levels of a database system. In this paper, we summarize the major findings of the literature on GPU-accelerated data processing. Based on this survey, we present key properties, important trade-offs and typical challenges of GPU-aware database architectures, and identify major open research questions.
very large data bases | 2017
Martin Kiefer; Max Heimel; Sebastian Breß; Volker Markl
Accurately predicting the cardinality of intermediate plan operations is an essential part of any modern relational query optimizer. The accuracy of said estimates has a strong and direct impact on the quality of the generated plans, and incorrect estimates can have a negative impact on query performance. One of the biggest challenges in this field is to predict the result size of join operations. Kernel Density Estimation (KDE) is a statistical method to estimate multivariate probability distributions from a data sample. Previously, we introduced a modern, self-tuning selectivity estimator for range scans based on KDE that out-performs state-of-the-art multidimensional histograms and is efficient to evaluate on graphics cards. In this paper, we extend these bandwidth-optimized KDE models to estimate the result size of single and multiple joins. In particular, we propose two approaches: (1) Building a KDE model from a sample drawn from the join result. (2) Efficiently combining the information from base table KDE models. We evaluated our KDE-based join estimators on a variety of synthetic and real-world datasets, demonstrating that they are superior to state-of-the art join estimators based on sketching or sampling.
international conference on management of data | 2013
Max Heimel
The hardware landscape is getting increasingly diverse. A modern desktop computer can contain multiple different processing architectures like multi-core CPUs or GPUs. This diversity is expected to grow significantly in the next ten years, with micro-architectures themselves diverging towards highly parallel and heterogeneous designs. We believe that preparing database systems to exploit this diverse landscape of processing architectures will be one of the major challenges for the coming decade in database research. In this paper, we present our thoughts and results on modifying the components of a database system to efficiently use modern processing architectures. In particular, we discuss our work on offloading parts of the Query Optimizer to highly parallel processors such as graphics cards, and present our work on designing a hardware-oblivious Execution Engine that can run unchanged on a multitude of different processing architectures.
very large data bases | 2010
Alexander Alexandrov; Max Heimel; Volker Markl; Dominic Battré; Fabian Hueske; Erik Nijkamp; Stephan Ewen; Odej Kao; Daniel Warneke