Is this you? Create Your Porfile

Robert Schöne

Dresden University of Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Robert Schöne is active.

Explore More

Publication

Featured researches published by Robert Schöne.

international conference on parallel architectures and compilation techniques | 2009

Memory Performance and Cache Coherency Effects on an Intel Nehalem Multiprocessor System

Daniel Molka; Daniel Hackenberg; Robert Schöne; Matthias S. Müller

Todays microprocessors have complex memory subsystems with several cache levels. The efficient use of this memory hierarchy is crucial to gain optimal performance, especially on multicore processors. Unfortunately, many implementation details of these processors are not publicly available. In this paper we present such fundamental details of the newly introduced Intel Nehalem microarchitecture with its integrated memory controller, Quick Path Interconnect, and ccNUMA architecture. Our analysis is based on sophisticated benchmarks to measure the latency and bandwidth between different locations in the memory subsystem. Special care is taken to control the coherency state of the data to gain insight into performance relevant implementation details of the cache coherency protocol. Based on these benchmarks we present undocumented performance data and architectural properties.

international parallel and distributed processing symposium | 2015

An Energy Efficiency Feature Survey of the Intel Haswell Processor

Daniel Hackenberg; Robert Schöne; Thomas Ilsche; Daniel Molka; Joseph Schuchart; Robin Geyer

The recently introduced Intel Xeon E5-1600 v3 and E5-2600 v3 series processors -- codenamed Haswell-EP -- implement major changes compared to their predecessors. Among these changes are integrated voltage regulators that enable individual voltages and frequencies for every core. In this paper we analyze a number of consequences of this development that are of utmost importance for energy efficiency optimization strategies such as dynamic voltage and frequency scaling (DVFS) and dynamic concurrency throttling (DCT). This includes the enhanced RAPL implementation and its improved accuracy as it moves from modeling to actual measurement. Another fundamental change is that every clock speed above AVX frequency -- including nominal frequency -- is opportunistic and unreliable, which vastly decreases performance predictability with potential effects on scalability. Moreover, we characterize significantly changed p-state transition behavior, and determine crucial memory performance data.

international conference on green computing | 2010

Characterizing the energy consumption of data transfers and arithmetic operations on x86−64 processors

Daniel Molka; Daniel Hackenberg; Robert Schöne; Matthias S. Müller

The energy efficiency of computer systems is influenced by many interdependent aspects. To asses the efficiency, typical benchmarks characterized the total power consumption of a computer system under certain domain specific workloads. For example, in case of the SPECPower benchmark the workload is a typical web server specific Java application. The contribution of individual components is usually not considered in this class of benchmarks. The CPU makes the most significant contribution due to both its high peak power consumption and the high variability depending on the workload. Correlations of workload and energy consumption of parts of the processors are usually done with simulations rather than actual measurements. This is mainly a consequence of the limited time resolution of power meters that is usually orders of magnitude too low to observe variations in the time scale of microarchitectural events. Furthermore, it is usually not possible to solely measure power consumption of processors as they are supplied by multiple power lines that are not easily accessible and are often shared with other components. In this paper we present benchmarks and a measurement methodology that compensate for the time resolution of our power meter by applying a constant and well-defined workload to the system. Using this experimental setup we analyze x86−64 microarchitectures from AMD and Intel. We furthermore characterize the contribution of individual operations and data transfers to the total power consumption of the Intel system.

international workshop on energy efficient supercomputing | 2014

HDEEM: high definition energy efficiency monitoring

Daniel Hackenberg; Thomas Ilsche; Joseph Schuchart; Robert Schöne; Wolfgang E. Nagel; Marc Simon; Yiannis Georgiou

Accurate and fine-grained power measurements of computing systems are essential for energy-aware performance optimizations of HPC systems and applications. Although cluster wide instrumentation options are available, fine spatial granularity and temporal resolution are not supported by the system vendors and extra hardware is needed to capture the power consumption information. We introduce the High Definition Energy Efficiency Monitoring (HDEEM) infrastructure, a sophisticated approach towards systemwide and fine-grained power measurements that enable energy-aware performance optimizations of parallel codes. Our approach is targeted at instrumenting multiple HPC racks with power sensors that have a sampling rate of about 8 kSa/s as well as finer spatial granularity, e.g., for per-CPU measurements. We specifically focus on the correctness of power measurement samples and energy consumption calculations based on these power samples. We also discuss scalable and low-overhead or overhead-free options for online and offline (post-mortem) processing of power measurement data.

2013 International Green Computing Conference Proceedings | 2013

Introducing FIRESTARTER: A processor stress test utility

Daniel Hackenberg; Roland Oldenburg; Daniel Molka; Robert Schöne

Processor stress test utilities are important tools for a number of different use cases. In particular, cooling systems need to be tested at maximum load in order to ensure that they fulfill their specifications. Additionally, a test system characterization in terms of idle and maximum power consumption is often a prerequisite for energy efficiency research. This creates the need for a simple yet versatile tool that generates near-peak power consumption of compute nodes. While in different research areas tools such as LINPACK and Prime95 are commonly used, these tools are just highly optimized and compute intense routines that solve specific computational problems. As stress test utilities they are unnecessarily hard to use and in many cases unreliable in terms of power consumption maximization. We propose FIRESTARTER, an Open Source tool that is specifically designed to create near-peak power consumption. Our experiments show that this task cannot be achieved with generic high-level language code. We therefore use highly optimized assembly routines that take the specific properties of a given processor microarchitecture into account. A study on four compute nodes with current or last generation x86_64 processors shows that we reliably exceed the power consumption of other stress tests and create very steady power consumption patterns.

Computer Science - Research and Development | 2010

Quantifying power consumption variations of HPC systems using SPEC MPI benchmarks

Daniel Hackenberg; Robert Schöne; Daniel Molka; Matthias S. Müller; Andreas Knüpfer

The power consumption of an HPC system is not only a major concern due to the huge associated operational cost. It also poses high demands on the infrastructure required to operate such a system. The power consumption strongly depends on the executed workload and is influenced by the system hard- and software and its setup. In this paper we analyze the power consumption of a 32-node cluster across a wide range of parallel applications using the SPEC MPI2007 benchmark. By measuring the variations of the power consumed by different hardware nodes and processes of an applications we lay the ground to extrapolate the energy demand of large parallel HPC systems.

Proceedings of the workshop on Memory Systems Performance and Correctness | 2014

Main memory and cache performance of intel sandy bridge and AMD bulldozer

Daniel Molka; Daniel Hackenberg; Robert Schöne

Application performance on multicore processors is seldom constrained by the speed of floating point or integer units. Much more often, limitations are caused by the memory subsystem, particularly shared resources such as last level caches or memory controllers. Measuring, predicting and modeling memory performance becomes a steeper challenge with each new processor generation due to the growing complexity and core count. We tackle the important aspect of measuring and understanding undocumented memory performance numbers in order to create valuable insight into microprocessor details. For this, we build upon a set of sophisticated benchmarks that support latency and bandwidth measurements to arbitrary locations in the memory subsystem. These benchmarks are extended to support AVX instructions for bandwidth measurements and to integrate the coherence states (O)wned and (F)orward. We then use these benchmarks to perform an indepth analysis of current ccNUMA multiprocessor systems with Intel (Sandy Bridge-EP) and AMD (Bulldozer) processors. Using our benchmarks we present fundamental memory performance data and illustrate performance-relevant architectural properties of both designs.

Computer Science - Research and Development | 2015

Wake-up latencies for processor idle states on current x86 processors

Robert Schöne; Daniel Molka; Michael Werner

During the last decades various low-power states have been implemented in processors. They can be used by the operating system to reduce the power consumption. The applied power saving mechanisms include load-dependent frequency and voltage scaling as well as the temporary deactivation of unused components. These techniques reduce the power consumption and thereby enable energy efficiency improvements if the system is not used to full capacity. However, an inappropriate usage of low-power states can significantly degrade the performance. The time required to re-establish full performance can be significant. Therefore, deep idle states are occasionally disabled, especially if applications have real-time requirements. In this paper, we describe how low-power states are implemented in current x86 processors. We then measure the wake-up latencies of various low-power states that occur when a processor core is reactivated. Finally, we compare our results to the vendor’s specifications that are exposed to the operating system.

international conference on parallel processing | 2015

Cache Coherence Protocol and Memory Performance of the Intel Haswell-EP Architecture

Daniel Molka; Daniel Hackenberg; Robert Schöne; Wolfgang E. Nagel

A major challenge in the design of contemporary microprocessors is the increasing number of cores in conjunction with the persevering need for cache coherence. To achieve this, the memory subsystem steadily gains complexity that has evolved to levels beyond comprehension of most application performance analysts. The Intel Has well-EP architecture is such an example. It includes considerable advancements regarding memory hierarchy, on-chip communication, and cache coherence mechanisms compared to the previous generation. We have developed sophisticated benchmarks that allow us to perform in-depth investigations with full memory location and coherence state control. Using these benchmarks we investigate performance data and architectural properties of the Has well-EP micro-architecture, including important memory latency and bandwidth characteristics as well as the cost of core-to-core transfers. This allows us to further the understanding of such complex designs by documenting implementation details the are either not publicly available at all, or only indirectly documented through patents.

international green and sustainable computing conference | 2015

Power measurements for compute nodes: Improving sampling rates, granularity and accuracy

Thomas Ilsche; Daniel Hackenberg; Stefan Graul; Robert Schöne; Joseph Schuchart

Energy efficiency is a key optimization goal for software and hardware in the High Performance Computing (HPC) domain. This necessitates sophisticated power measurement capabilities that are characterized by the key criteria (i) high sampling rates, (ii) measurement of individual components, (iii) well-defined accuracy, and (iv) high scalability. In this paper, we tackle the first three of these goals and describe the instrumentation of two high-end compute nodes with three different current measurement techniques: (i) Hall effect sensors, (ii) measuring shunts in extension cables and riser cards, and (iii) tapping into the voltage regulators. The resulting measurement data for components such as sockets, PCIe cards, and DRAM DIMMs is digitized at sampling rates from 7 kSa/s up to 500 kSa/s, enabling a fine-grained correlation between power usage and application events. The accuracy of all elements in the measurement infrastructure is studied carefully. Moreover, potential pitfalls in building custom power instrumentation are discussed. We raise the awareness for the properties of power measurements, as disregarding existing inaccuracies can lead to invalid conclusions regarding energy efficiency.

Explore More