Thomas Willhalm | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Thomas Willhalm is active.

Explore More

Publication

Featured researches published by Thomas Willhalm.

very large data bases | 2009

SIMD-scan: ultra fast in-memory table scan using on-chip vector processing units

Thomas Willhalm; Nicolae Popovici; Yazan Boshmaf; Hasso Plattner; Alexander Zeier; Jan Schaffner

The availability of huge system memory, even on standard servers, generated a lot of interest in main memory database engines. In data warehouse systems, highly compressed column-oriented data structures are quite prominent. In order to scale with the data volume and the system load, many of these systems are highly distributed with a shared-nothing approach. The fundamental principle of all systems is a full table scan over one or multiple compressed columns. Recent research proposed different techniques to speedup table scans like intelligent compression or using an additional hardware such as graphic cards or FPGAs. In this paper, we show that utilizing the embedded Vector Processing Units (VPUs) found in standard superscalar processors can speed up the performance of mainmemory full table scan by factors. This is achieved without changing the hardware architecture and thereby without additional power consumption. Moreover, as on-chip VPUs directly access the systems RAM, no additional costly copy operations are needed for using the new SIMD-scan approach in standard main memory database engines. Therefore, we propose this scan approach to be used as the standard scan operator for compressed column-oriented main memory storage. We then discuss how well our solution scales with the number of processor cores; consequently, to what degree it can be applied in multi-threaded environments. To verify the feasibility of our approach, we implemented the proposed techniques on a modern Intel multi-core processor using Intel® Streaming SIMD Extensions (Intel® SSE). In addition, we integrated the new SIMD-scan approach into SAP® Netweaver® Business Warehouse Accelerator. We conclude with describing the performance benefits of using our approach for processing and scanning compressed data using VPUs in column-oriented main memory database systems.

Proceedings of the 1st international workshop on Multicore software engineering | 2008

Putting intel® threading building blocks to work

Thomas Willhalm; Nicolae Popovici

Intel® Threading Building Blocks (TBB) was designed to simplify programming for multi-core platforms. By introducing a new way of expressing parallelism, developers can focus on efficient scalable parallel program design and avoid dealing with low level details of threading. This talk will focus on the task-based approach to threading of the Intel® TBB. We will show step by step what needs to be done in order to take advantage of the TBB task based programming model in your code.

data management on new hardware | 2014

SOFORT: a hybrid SCM-DRAM storage engine for fast data recovery

Ismail Oukid; Daniel Booss; Wolfgang Lehner; Peter Bumbulis; Thomas Willhalm

Storage Class Memory (SCM) has the potential to significantly improve database performance. This potential has been well documented for throughput [4] and response time [25, 22]. In this paper we show that SCM has also the potential to significantly improve restart performance, a shortcoming of traditional main memory database systems. We present SOFORT, a hybrid SCM-DRAM storage engine that leverages full capabilities of SCM by doing away with a traditional log and updating the persisted data in place in small increments. We show that we can achieve restart times of a few seconds independent of instance size and transaction volume without significantly impacting transaction throughput.

international conference on management of data | 2016

FPTree: A Hybrid SCM-DRAM Persistent and Concurrent B-Tree for Storage Class Memory

Ismail Oukid; Johan Lasperas; Anisoara Nica; Thomas Willhalm; Wolfgang Lehner

The advent of Storage Class Memory (SCM) is driving a rethink of storage systems towards a single-level architecture where memory and storage are merged. In this context, several works have investigated how to design persistent trees in SCM as a fundamental building block for these novel systems. However, these trees are significantly slower than DRAM-based counterparts since trees are latency-sensitive and SCM exhibits higher latencies than DRAM. In this paper we propose a novel hybrid SCM-DRAM persistent and concurrent B-Tree, named Fingerprinting Persistent Tree (FPTree) that achieves similar performance to DRAM-based counterparts. In this novel design, leaf nodes are persisted in SCM while inner nodes are placed in DRAM and rebuilt upon recovery. The FPTree uses Fingerprinting, a technique that limits the expected number of in-leaf probed keys to one. In addition, we propose a hybrid concurrency scheme for the FPTree that is partially based on Hardware Transactional Memory. We conduct a thorough performance evaluation and show that the FPTree outperforms state-of-the-art persistent trees with different SCM latencies by up to a factor of 8.2. Moreover, we show that the FPTree scales very well on a machine with 88 logical cores. Finally, we integrate the evaluated trees in memcached and a prototype database. We show that the FPTree incurs an almost negligible performance overhead over using fully transient data structures, while significantly outperforming other persistent trees.

international conference on big data | 2013

Memory system characterization of big data workloads

Martin Dimitrov; Karthik Kumar; Patrick Lu; Vish Viswanathan; Thomas Willhalm

Two recent trends that have emerged include (1) Rapid growth in big data technologies with new types of computing models to handle unstructured data, such as map-reduce and noSQL (2) A growing focus on the memory subsystem for performance and power optimizations, particularly with emerging memory technologies offering different characteristics from conventional DRAM (bandwidths, read/write asymmetries). This paper examines how these trends may intersect by characterizing the memory access patterns of various Hadoop and noSQL big data workloads. Using memory DIMM traces collected using special hardware, we analyze the spatial and temporal reference patterns to bring out several insights related to memory and platform usages, such as memory footprints, read-write ratios, bandwidths, latencies, etc. We develop an analysis methodology to understand how conventional optimizations such as caching, prediction, and prefetching may apply to these workloads, and discuss the implications on software and system design.

ieee international symposium on workload characterization | 2015

Quantifying the Performance Impact of Memory Latency and Bandwidth for Big Data Workloads

Russell M. Clapp; Martin Dimitrov; Karthik Kumar; Vish Viswanathan; Thomas Willhalm

In recent years, DRAM technology improvements have scaled at a much slower pace than processors. While server processor core counts grow from 33% to 50% on a yearly cadence, DDR 3/4 memory channel bandwidth has grown at a slower rate, and memory latency has remained relatively flat for some time. Combined with new computing paradigms such as big data analytics, which involves analyzing massive volumes of data in real time, there is a trend of increasing pressure on the memory subsystem. This makes it important for computer architects to understand the sensitivity of the performance of big data workloads to memory bandwidth and latency, and how these workloads compare to more conventional workloads. To address this, we present straightforward analytic equations to quantify the impact of memory bandwidth and latency on workload performance, leveraging measured data from performance counters on real systems. We demonstrate how the values of the components of these equations can be used to classify different workloads according to their inherent bandwidth requirement and latency sensitivity. Using this performance model, we show the relative sensitivities of big data, high-performance computing, and enterprise workload classes to changes in memory bandwidth and latency.

very large data bases | 2017

Memory management techniques for large-scale persistent-main-memory systems

Ismail Oukid; Daniel Booss; Adrien Lespinasse; Wolfgang Lehner; Thomas Willhalm; Grégoire Gomes

Storage Class Memory (SCM) is a novel class of memory technologies that promise to revolutionize database architectures. SCM is byte-addressable and exhibits latencies similar to those of DRAM, while being non-volatile. Hence, SCM could replace both main memory and storage, enabling a novel single-level database architecture without the traditional I/O bottleneck. Fail-safe persistent SCM allocation can be considered conditio sine qua non for enabling this novel architecture paradigm for database management systems. In this paper we present PAllocator, a fail-safe persistent SCM allocator whose design emphasizes high concurrency and capacity scalability. Contrary to previous works, PAllocator thoroughly addresses the important challenge of persistent memory fragmentation by implementing an efficient defragmentation algorithm. We show that PAllocator outperforms state-of-the-art persistent allocators by up to one order of magnitude, both in operation throughput and recovery time, and enables up to 2.39x higher operation throughput on a persistent B-Tree.

very large data bases | 2017

SAP HANA adoption of non-volatile memory

Mihnea Andrei; Christian Lemke; Günter Radestock; Robert Schulze; Carsten Thiel; Rolando Blanco; Akanksha Meghlan; Muhammad Sharique; Sebastian Seifert; Surendra Vishnoi; Daniel Booss; Thomas Peh; Ivan Schreter; Werner Thesing; Mehul Wagle; Thomas Willhalm

Non-Volatile RAM (NVRAM) is a novel class of hardware technology which is an interesting blend of two storage paradigms: byte-addressable DRAM and block-addressable storage (e.g. HDD/SSD). Most of the existing enterprise relational data management systems such as SAP HANA have their internal architecture based on the inherent assumption that memory is volatile and base their persistence on explicit handling of block-oriented storage devices. In this paper, we present the early adoption of Non-Volatile Memory within the SAP HANA Database, from the architectural and technical angles. We discuss our architectural choices, dive deeper into a few challenges of the NVRAM integration and their solutions, and share our experimental results. As we present our solutions for the NVRAM integration, we also give, as a basis, a detailed description of the relevant HANA internals.

Information Technology | 2017

Storage class memory and databases: Opportunities and challenges

Ismail Oukid; Robert Kettler; Thomas Willhalm

Abstract Storage Class Memory (SCM) is emerging as a viable solution to lift DRAMs scalability limits, both in capacity and energy consumption. Indeed, SCM combines the economic characteristics, non-volatility, and density of traditional storage media with the low latency and byte-addressability of DRAM. In this paper we survey research works on how SCM can be leveraged in databases and explore different solutions ranging from using SCM as disk replacement, to single-level storage architectures, where SCM is used as universal memory (i.e., as memory and storage at the same time), together with the challenges that stem from these opportunities. Finally, we synthesize our findings into recommendations on how to exploit the full potential of SCM in next-generation database architectures.

measurement and modeling of computer systems | 2015

A Simple Model to Quantify the Impact of Memory Latency and Bandwidth on Performance

Russell M. Clapp; Martin Dimitrov; Karthik Kumar; Vish Viswanathan; Thomas Willhalm

In recent years, DRAM technology improvements have scaled at a much slower pace than processors. While server processor core counts grow from 33% to 50% on a yearly cadence, DDR4 memory channel bandwidth has grown at a slower rate, and memory latency has remained relatively flat for some time. Meanwhile, new computing paradigms have emerged, which involve analyzing massive volumes of data in real time and place pressure on the memory subsystem. The combination of these trends makes it important for computer architects to understand the sensitivity of the workload performance to memory bandwidth and latency. In this paper, we outline and validate a methodology for quick and quantitative performance estimation using a real-world workload.

Explore More