Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Holger Pirk is active.

Publication


Featured researches published by Holger Pirk.


very large data bases | 2013

Hardware-oblivious parallelism for in-memory column-stores

Max Heimel; Michael Saecker; Holger Pirk; Stefan Manegold; Volker Markl

The multi-core architectures of todays computer systems make parallelism a necessity for performance critical applications. Writing such applications in a generic, hardware-oblivious manner is a challenging problem: Current database systems thus rely on labor-intensive and error-prone manual tuning to exploit the full potential of modern parallel hardware architectures like multi-core CPUs and graphics cards. We propose an alternative design for a parallel database engine, based on a single set of hardware-oblivious operators, which are compiled down to the actual hardware at runtime. This design reduces the development overhead for parallel database engines, while achieving competitive performance to hand-tuned systems. We provide a proof-of-concept for this design by integrating operators written using the parallel programming framework OpenCL into the open-source database MonetDB. Following this approach, we achieve efficient, yet highly portable parallel code without the need for optimization by hand. We evaluated our implementation against MonetDB using TPC-H derived queries and observed a performance that rivals that of MonetDBs query execution on the CPU and surpasses it on the GPU. In addition, we show that the same set of operators runs nearly unchanged on a GPU, demonstrating the feasibility of our approach.


very large data bases | 2016

Voodoo - a vector algebra for portable database performance on modern hardware

Holger Pirk; Oscar Moll; Matei Zaharia; Samuel Madden

In-memory databases require careful tuning and many engineering tricks to achieve good performance. Such database performance engineering is hard: a plethora of data and hardware-dependent optimization techniques form a design space that is difficult to navigate for a skilled engineer --- even more so for a query compiler. To facilitate performance-oriented design exploration and query plan compilation, we present Voodoo, a declarative intermediate algebra that abstracts the detailed architectural properties of the hardware, such as multi- or many-core architectures, caches and SIMD registers, without losing the ability to generate highly tuned code. Because it consists of a collection of declarative, vector-oriented operations, Voodoo is easier to reason about and tune than low-level C and related hardware-focused extensions (Intrinsics, OpenCL, CUDA, etc.). This enables our Voodoo compiler to produce (OpenCL) code that rivals and even outperforms the fastest state-of-the-art in memory databases for both GPUs and CPUs. In addition, Voodoo makes it possible to express techniques as diverse as cache-conscious processing, predication and vectorization (again on both GPUs and CPUs) with just a few lines of code. Central to our approach is a novel idea we termed control vectors, which allows a code generating frontend to expose parallelism to the Voodoo compiler in a abstract manner, enabling portable performance across hardware platforms. We used Voodoo to build an alternative backend for MonetDB, a popular open-source in-memory database. Our backend allows MonetDB to perform at the same level as highly tuned in-memory databases, including HyPeR and Ocelot. We also demonstrate Voodoos usefulness when investigating hardware conscious tuning techniques, assessing their performance on different queries, devices and data.


international conference on data engineering | 2013

CPU and cache efficient management of memory-resident databases

Holger Pirk; Florian Funke; Martin Grund; Thomas Neumann; Ulf Leser; Stefan Manegold; Alfons Kemper; Martin L. Kersten

Memory-Resident Database Management Systems (MRDBMS) have to be optimized for two resources: CPU cycles and memory bandwidth. To optimize for bandwidth in mixed OLTP/OLAP scenarios, the hybrid or Partially Decomposed Storage Model (PDSM) has been proposed. However, in current implementations, bandwidth savings achieved by partial decomposition come at increased CPU costs. To achieve the aspired bandwidth savings without sacrificing CPU efficiency, we combine partially decomposed storage with Just-in-Time (JiT) compilation of queries, thus eliminating CPU inefficient function calls. Since existing cost based optimization components are not designed for JiT-compiled query execution, we also develop a novel approach to cost modeling and subsequent storage layout optimization. Our evaluation shows that the JiT-based processor maintains the bandwidth savings of previously presented hybrid query processors but outperforms them by two orders of magnitude due to increased CPU efficiency.


international conference on data engineering | 2014

Waste not… Efficient co-processing of relational data

Holger Pirk; Stefan Manegold; Martin L. Kersten

The variety of memory devices in modern computer systems holds opportunities as well as challenges for data management systems. In particular, the exploitation of Graphics Processing Units (GPUs) and their fast memory has been studied quite intensively. However, current approaches treat GPUs as systems in their own right and fail to provide a generic strategy for efficient CPU/GPU cooperation. We propose such a strategy for relational query processing: calculating an approximate result based on lossily compressed, GPU-resident data and refine the result using residuals, i.e., the lost data, on the CPU.We developed the required algorithms, implemented the strategy in an existing DBMS and found up to 8 times performance improvement, even for datasets larger than the available GPU memory.


data management on new hardware | 2014

Database cracking: fancy scan, not poor man's sort!

Holger Pirk; Eleni Petraki; Stratos Idreos; Stefan Manegold; Martin L. Kersten

Database Cracking is an appealing approach to adaptive indexing: on every range-selection query, the data is partitioned using the supplied predicates as pivots. The core of database cracking is, thus, pivoted partitioning. While pivoted partitioning, like scanning, requires a single pass through the data it tends to have much higher costs due to lower CPU efficiency. In this paper, we conduct an in-depth study of the reasons for the low CPU efficiency of pivoted partitioning. Based on the findings, we develop an optimized version with significantly higher (single-threaded) CPU efficiency. We also develop a number of multi-threaded implementations that are effectively bound by memory bandwidth. Combining all of these optimizations we achieve an implementation that has costs close to or better than an ordinary scan on a variety of systems ranging from low-end (cheaper than


web reasoning and rule systems | 2012

Building virtual earth observatories using ontologies and linked geospatial data

Manolis Koubarakis; Manos Karpathiotakis; Kostis Kyzirakos; Charalampos Nikolaou; Stavros Vassos; George Garbis; Michael Sioutis; Konstantina Bereta; Stefan Manegold; Martin L. Kersten; Milena Ivanova; Holger Pirk; Ying Zhang; Charalampos Kontoes; Ioannis Papoutsis; Themistoklis Herekakis; Dimitris Mihail; Mihai Datcu; Gottfried Schwarz; Octavian Dumitru; Daniela Espinoza Molina; Katrin Molch; Ugo Di Giammatteo; Manuela Sagona; Sergio Perelli; Eva Klien; Thorsten Reitz; Robert Gregor

300) desktop machines to high-end (above


international conference on data engineering | 2015

The DBMS - your big data sommelier

Yagiz Kargin; Martin L. Kersten; Stefan Manegold; Holger Pirk

60,000) servers.


Technology Conference on Performance Evaluation and Benchmarking | 2012

Scalable Generation of Synthetic GPS Traces with Real-Life Data Characteristics

Konrad Bösche; Thibault Sellam; Holger Pirk; René Beier; Peter Mieth; Stefan Manegold

Advances in remote sensing technologies have enabled public and commercial organizations to send an ever-increasing number of satellites in orbit around Earth. As a result, Earth Observation (EO) data has been constantly increasing in volume in the last few years, and is currently reaching petabytes in many satellite archives. For example, the multi-mission data archive of the TELEIOS partner German Aerospace Center (DLR) is expected to reach 2PB next year, while ESA estimates that it will be archiving 20PB of data before the year 2020. As the volume of data in satellite archives has been increasing, so have the scientific and commercial applications of EO data. Nevertheless, it is estimated that up to 95% of the data present in existing archives has never been accessed, so the potential for increasing exploitation is very big.


data management on new hardware | 2015

By their fruits shall ye know them: A Data Analyst's Perspective on Massively Parallel System Design

Holger Pirk; Samuel Madden; Michael Stonebraker

When addressing the problem of “big” data volume, preparation costs are one of the key challenges: the high costs for loading, aggregating and indexing data leads to a long data-to-insight time. In addition to being a nuisance to the end-user, this latency prevents real-time analytics on “big” data. Fortunately, data often comes in semantic chunks such as files that contain data items that share some characteristics such as acquisition time or location. A data management system that exploits this trait can significantly lower the data preparation costs and the associated data-to-insight time by only investing in the preparation of the relevant chunks. In this paper, we develop such a system as an extension of an existing relational DBMS (MonetDB). To this end, we develop a query processing paradigm and data storage model that are partial-loading aware. The result is a system that can make a 1.2 TB dataset (consisting of 4000 chunks) ready for querying in less than 3 minutes on a single server-class machine while maintaining good query processing performance.


very large data bases | 2016

Non-invasive progressive optimization for in-memory databases

Steffen Zeuch; Holger Pirk; Johann Christoph Freytag

Database benchmarking is most valuable if real-life data and workloads are available. However, real-life data (and workloads) are often not publicly available due to IPR constraints or privacy concerns. And even if available, they are often limited regarding scalability and variability of data characteristics. On the other hand, while easily scalable, synthetically generated data often fail to adequately reflect real-life data characteristics. While there are well established synthetic benchmarks and data generators for, e.g., business data (TPC-C, TPC-H), there is no such up-to-date data generator, let alone benchmark, for spatiotemporal and/or moving objects data.

Collaboration


Dive into the Holger Pirk's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Samuel Madden

Massachusetts Institute of Technology

View shared research outputs
Researchain Logo
Decentralizing Knowledge