David Schwalb | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where David Schwalb is active.

Explore More

Publication

Featured researches published by David Schwalb.

very large data bases | 2011

Fast updates on read-optimized databases using multi-core CPUs

Jens Krueger; Changkyu Kim; Martin Grund; Nadathur Satish; David Schwalb; Jatin Chhugani; Hasso Plattner; Pradeep Dubey; Alexander Zeier

Read-optimized columnar databases use differential updates to handle writes by maintaining a separate write-optimized delta partition which is periodically merged with the read-optimized and compressed main partition. This merge process introduces significant overheads and unacceptable downtimes in update intensive systems, aspiring to combine transactional and analytical workloads into one system. In the first part of the paper, we report data analyses of 12 SAP Business Suite customer systems. In the second half, we present an optimized merge process reducing the merge overhead of current systems by a factor of 30. Our linear-time merge algorithm exploits the underlying high compute and bandwidth resources of modern multi-core CPUs with architecture-aware optimizations and efficient parallelization. This enables compressed in-memory column stores to handle the transactional update rate required by enterprise applications, while keeping properties of read-optimized databases for analytic-style queries.

very large data bases | 2015

NVC-Hashmap: A Persistent and Concurrent Hashmap For Non-Volatile Memories

David Schwalb; Markus Dreseler; Matthias Uflacker; Hasso Plattner

Non-volatile RAM (NVRAM) will fundamentally change in-memory databases as data structures do not have to be explicitly backed up to hard drives or SSDs, but can be inherently persistent in main memory. To guarantee consistency even in the case of power failures, programmers need to ensure that data is flushed from volatile CPU caches where it would be susceptible to power outages to NVRAM. In this paper, we present the NVC-Hashmap, a lock-free hashmap that is used for unordered dictionaries and delta indices in in-memory databases. The NVC-Hashmap is then evaluated in both stand-alone and integrated database benchmarks and compared to a B+-Tree based persistent data structure.

very large data bases | 2015

Efficient Transaction Processing for Hyrise in Mixed Workload Environments

David Schwalb; Martin Faust; Johannes Wust; Martin Grund; Hasso Plattner

Hyrise is an in-memory storage engine designed for mixed enterprise workloads that originally started as a research prototype for hybrid table layouts and basic transaction processing capabilities. This paper presents our incremental improvements and learnings to better support transactional consistency in mixed workloads.

international conference on data engineering | 2014

Leveraging in-memory technology for interactive analyses of point-of-sales data

David Schwalb; Martin Faust; Jens Krueger; Hasso Plattner

Retailers face not only the challenge of consolidating all the data generated by electronic point-of-sale (POS) terminals, but also to leverage the data to derive business value. Especially when the data is stored at its finest granularity recording the actual transactions with all their items, processing becomes a challenge. In this work, we describe how in-memory technology can help to analyze POS data and how it enables new types of enterprise applications. We show that it is possible to interactively explore the transactional data set without precomputing analytical summaries while providing users with full flexibility. As an example, we present a prototypical application for interactive analyses and exploration of 8 billion records of real data from a large retail company with sub-second response times.

database systems for advanced applications | 2014

Concurrent Execution of Mixed Enterprise Workloads on In-Memory Databases

Johannes Wust; Martin Grund; Kai Hoewelmeyer; David Schwalb; Hasso Plattner

In the world of enterprise computing, single applications are often classified either as transactional or analytical. From a data management perspective, both application classes issue a database workload with commonly agreed characteristics. However, traditional database management systems (DBMS) are typically optimized for one or the other. Today, we see two trends in enterprise applications that require bridging these two workload categories: (1) enterprise applications of both classes access a single database instance and (2) longer-running, analytical-style queries issued by transactional applications. As a reaction to this change, in-memory DBMS on multi-core CPUs have been proposed to handle the mix of transactional and analytical queries in a single database instance. However, running heterogeneous queries potentially causes situations where longer running queries block shorter running queries from execution. A task-based query execution model with priority-based scheduling allows for an effective prioritization of query classes. This paper discusses the impact of task granularity on responsiveness and throughput of an in-memory DBMS. We show that a larger task size for long running operators negatively affects the response time of short running queries. Based on this observation, we propose a solution to limit the maximum task size with the objective of controlling the mutual performance impact of query classes.

database systems for advanced applications | 2013

Physical Column Organization in In-Memory Column Stores

David Schwalb; Martin Faust; Jens Krueger; Hasso Plattner

Cost models are an essential part of database systems, as they are the basis of query performance optimization. Disk based systems are well understood and sophisticated models exist to compare various data structures and to estimate query costs based on disk IO operations. Cost models for in-memory databases shift the focus from disk IOs to main memory accesses and CPU costs. However, modeling memory accesses is fundamentally different and common models do not apply anymore.

database systems for advanced applications | 2016

Hyrise-NV: Instant Recovery for In-Memory Databases Using Non-Volatile Memory

David Schwalb; B K Girish Kumar; Markus Dreseler; S Anusha; Martin Faust; Adolf Hohl; Tim Berning; Gaurav Makkar; Hasso Plattner; Parag Deshmukh

Emerging non-volatile memory technologies NVM offer fast and byte-addressable access, allowing to rethink the durability mechanisms of in-memory databases. In this paper, we present Hyrise-NV, a database storage engine that maintains table and index structures on NVM. Our architecture updates the database state and index structures transactionally consistent on NVM using multi-version data structures, allowing to instantly recover databases independent of their size. For index structures, we present nvBTree using multi-versioning to provide failure-atomic tree updates on NVM. We evaluate Hyrise-NV both on DRAM and with hardware-based emulation of NVM using the TPC-C benchmark. Hyrise-NV recovers databases independent of their size, allowing the recovery of a table with 10 million rows in less than 100i¾źms.

very large data bases | 2015

Hyrise-R: Scale-out and Hot-Standby through Lazy Master Replication for Enterprise Applications

David Schwalb; Jan Kossmann; Martin Faust; Stefan Klauck; Matthias Uflacker; Hasso Plattner

In-memory database systems are well-suited for enterprise workloads, consisting of transactional and analytical queries. A growing number of users and an increasing demand for enterprise applications can saturate or even overload single-node database systems at peak times. Better performance can be achieved by improving a single machines hardware but it is often cheaper and more practicable to follow a scale-out approach and replicate data by using additional machines. In this paper we present Hyrise-R, a lazy master replication system for the in-memory database Hyrise. By setting up a snapshot-based Hyrise cluster, we increase both performance by distributing queries over multiple instances and availability by utilizing the redundancy of the cluster structure. This paper describes the architecture of Hyrise-R and details of the implemented replication mechanisms. We set up Hyrise-R on instances of Amazons Elastic Compute Cloud and present a detailed performance evaluation of our system, including a linear query throughput increase for enterprise workloads.

international conference on data engineering | 2016

Leveraging non-volatile memory for instant restarts of in-memory database systems

David Schwalb; Martin Faust; Markus Dreseler; Pedro Flemming; Hasso Plattner

Emerging non-volatile memory technologies (NVM) offer fast and byte-addressable access, allowing to rethink the durability mechanisms of in-memory databases. Hyrise-NV is a database storage engine that maintains table and index structures on NVM. Our architecture updates the database state and index structures transactionally consistent on NVM using multi-version data structures, allowing to instantly recover data-bases independent of their size. In this paper, we demonstrate the instant restart capabilities of Hyrise-NV, storing all data on non-volatile memory. Recovering a dataset of size 92.2 GB takes about 53 seconds using our log-based approach, whereas Hyrise-NV recovers in under one second.

very large data bases | 2013

Fast Column Scans: Paged Indices for In-Memory Column Stores

Martin Faust; David Schwalb; Jens Krueger

Commodity hardware is available in configurations with huge amounts of main memory and it is viable to keep large databases of enterprises in the RAM of one or a few machines. Additionally, a reunification of transactional and analytical systems has been proposed to enable operational reporting on the most recent data. In-memory column stores appeared in academia and industry as a solution to handle the resulting mixed workload of transactional and analytical queries. Therein queries are processed by scanning whole columns to evaluate the predicates on non-key columns. This leads to a waste of memory bandwidth and reduced throughput.

Explore More