Patrick Damme
Dresden University of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Patrick Damme.
Technology Conference on Performance Evaluation and Benchmarking | 2015
Patrick Damme; Dirk Habich; Wolfgang Lehner
Lightweight data compression is frequently applied in main memory database systems to improve query performance. The data processed by such systems is highly diverse. Moreover, there is a high number of existing lightweight compression techniques. Therefore, choosing the optimal technique for a given dataset is non-trivial. Existing approaches are based on simple rules, which do not suffice for such a complex decision. In contrast, our vision is a cost-based approach. However, this requires a detailed cost model, which can only be obtained from a systematic benchmarking of many compression algorithms on many different datasets. A naive benchmark evaluates every algorithm under consideration separately. This yields many redundant steps and is thus inefficient. We propose an efficient and extensible benchmark framework for compression techniques. Given an ensemble of algorithms, it minimizes the overall run time of the evaluation. We experimentally show that our approach outperforms the naive approach.
advances in databases and information systems | 2015
Patrick Damme; Dirk Habich; Wolfgang Lehner
Lightweight data compression techniques like dictionary or run-length compression play an important role in main memory database systems. Having decided for a compression scheme for a dataset, the transformation to another scheme is very inefficient today. The common approach works as follows: First, the compressed data is decompressed using the source decompression algorithm resulting in the materialization of the raw data in main memory. Second, the compression algorithm of the destination scheme is applied. This indirect way relies on existing algorithms, but is very inefficient, since the whole uncompressed data has to be materialized as an intermediate step. To overcome these drawbacks, we propose a novel approach called direct transformation, which avoids the materialization of the whole uncompressed data. Our techniques are cache optimized to reduce necessary data accesses. Moreover, we present application scenarios, where such direct transformations can be efficiently applied.
very large data bases | 2016
Juliana Hildebrandt; Dirk Habich; Patrick Damme; Wolfgang Lehner
In-memory database systems have to keep base data as well as intermediate results generated during query processing in main memory. In addition, the effort to access intermediate results is equivalent to the effort to access the base data. Therefore, the optimization of intermediate results is interesting and has a high impact on the performance of the query execution. For this domain, we propose the continuous use of lightweight compression methods for intermediate results and have the aim of developing a balanced query processing approach based on compressed intermediate results. To minimize the overall query execution time, it is important to find a balance between the reduced transfer times and the increased computational effort. This paper provides an overview and presents a system design for our vision. Our system design addresses the challenge of integrating a large and evolving corpus of lightweight data compression algorithms in an in-memory column store. In detail, we present our model-driven approach and describe ongoing research topics to realize our compression-aware query processing vision.
Proceedings of the 2nd International Workshop on Open Data | 2013
Julian Eberius; Patrick Damme; Katrin Braunschweig; Maik Thiele; Wolfgang Lehner
Platforms for publication and collaborative management of data, such as Data.gov or Google Fusion Tables, are a new trend on the web. They manage very large corpora of datasets, but often lack an integrated schema, ontology, or even just common publication standards. This results in inconsistent names for attributes of the same meaning, which constrains the discovery of relationships between datasets as well as their reusability. Existing data integration techniques focus on reuse-time, i.e., they are applied when a user wants to combine a specific set of datasets or integrate them with an existing database. In contrast, this paper investigates a novel method of data integration at publish-time, where the publisher is provided with suggestions on how to integrate the new dataset with the corpus as a whole, without resorting to a manually created mediated schema or ontology for the platform. We propose data-driven algorithms that propose alternative attribute names for a newly published dataset based on attribute- and instance statistics maintained on the corpus. We evaluate the proposed algorithms using real-world corpora based on the Open Data Platform opendata.socrata.com and relational data extracted from Wikipedia. We report on the systems response time, and on the results of an extensive crowdsourcing-based evaluation of the quality of the generated attribute names alternatives.
international conference on management of data | 2018
Dirk Habich; Patrick Damme; Annett Ungethüm; Wolfgang Lehner
The exploitation of data as well as hardware properties is a core aspect for efficient data management. This holds in particular for the field of in-memory data processing. Aside from increasing main memory capacities, in-memory data processing also benefits from novel processing concepts based on lightweight compressed data. To speed up compression as well as decompression, an active research field deals with the specialization of these algorithms to hardware features such as vectorization using SIMD instructions. Most of the vectorized implementations have been proposed for 128 bit vector registers. However, hardware vendors still increase the vector register sizes, whereby a straightforward transformation to these wider vector sizes is possible in most-cases. Thus, we systematically investigated the impact of different SIMD instruction set extensions with wider vector sizes on the behavior of straightforward transformed implementations. In this paper, we will describe our evaluation methodology and present selective results of our exhaustive evaluation. In particular, we will highlight some challenges and present first approaches to tackle them.
advances in databases and information systems | 2017
Annett Ungethüm; Patrick Damme; Johannes Pietrzyk; Alexander Krause; Dirk Habich; Wolfgang Lehner
Energy consumption becomes more and more a critical design factor, whereby performance is still an important requirement. Thus, a balance between performance and energy has to be established. To tackle that issue for database systems, we proposed the concept of work-energy profiles. However, generating such profiles requires extensive benchmarking. To overcome that, we propose to approximate work-energy-profiles for complex operations based on the profiles of low-level operations in this paper. To show the feasibility of our approach, we use lightweight data compression algorithms as complex operations, since compression as well as decompression are heavily used in in-memory database systems, where data is always managed in a compressed representation. Furthermore, we evaluate our approach on a concrete hardware system.
international conference on data technologies and applications | 2015
Till Kolditz; Dirk Habich; Patrick Damme; Wolfgang Lehner; Dmitrii Kuvaiskii; Oleksii Oleksenko; Christof Fetzer
Nowadays, database systems pursuit a main memory-centric architecture, where the entire business-related data is stored and processed in a compressed form in main memory. In this case, the performance gain is massive because database operations can benefit from its higher bandwidth and lower latency. However, current main memory-centric database systems utilize general-purpose error detection and correction solutions to address the emerging problem of increasing dynamic error rate of main memory. The costs of these generalpurpose methods dramatically increases with increasing error rates. To reduce these costs, we have to exploit context knowledge of database systems for resiliency. Therefore, we introduce our vision of resiliency-aware data compression in this paper, where we want to exploit the benefits of both fields in an integrated approach with low performance and memory overhead. In detail, we present and evaluate a first approach using AN encoding and two different compression schemes to show the potentials and challenges of our vision.
international conference on data engineering | 2015
Hannes Voigt; Patrick Damme; Wolfgang Lehner
Relational database management systems build on the closed world assumption requiring upfront modeling of a usually stable schema. However, a growing number of todays database applications are characterized by self-descriptive data. The schema of self-descriptive data is very dynamic and prone to frequent changes; a situation which is always troublesome to handle in relational systems. This demo presents the relational database management system FRDM. With flexible relational tables FRDM greatly simplifies the management of self-descriptive data in a relational database system. Self-descriptive data can reside directly next to traditionally modeled data and both can be queried together using SQL. This demo presents the various features of FRDM and provides first-hand experience of the newly gained freedom in relational database systems.
extending database technology | 2017
Patrick Damme; Dirk Habich; Juliana Hildebrandt; Wolfgang Lehner
extending database technology | 2017
Patrick Damme; Dirk Habich; Juliana Hildebrandt; Wolfgang Lehner