Joseph Schuchart
Dresden University of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Joseph Schuchart.
international parallel and distributed processing symposium | 2015
Daniel Hackenberg; Robert Schöne; Thomas Ilsche; Daniel Molka; Joseph Schuchart; Robin Geyer
The recently introduced Intel Xeon E5-1600 v3 and E5-2600 v3 series processors -- codenamed Haswell-EP -- implement major changes compared to their predecessors. Among these changes are integrated voltage regulators that enable individual voltages and frequencies for every core. In this paper we analyze a number of consequences of this development that are of utmost importance for energy efficiency optimization strategies such as dynamic voltage and frequency scaling (DVFS) and dynamic concurrency throttling (DCT). This includes the enhanced RAPL implementation and its improved accuracy as it moves from modeling to actual measurement. Another fundamental change is that every clock speed above AVX frequency -- including nominal frequency -- is opportunistic and unreliable, which vastly decreases performance predictability with potential effects on scalability. Moreover, we characterize significantly changed p-state transition behavior, and determine crucial memory performance data.
ieee international conference on high performance computing data and analytics | 2013
M. Bussmann; Heiko Burau; T. E. Cowan; Alexander Debus; Axel Huebl; Guido Juckeland; T. Kluge; Wolfgang E. Nagel; Richard Pausch; Felix Schmitt; U. Schramm; Joseph Schuchart; René Widera
We present a particle-in-cell simulation of the relativistic Kelvin-Helmholtz Instability (KHI) that for the first time delivers angularly resolved radiation spectra of the particle dynamics during the formation of the KHI. This enables studying the formation of the KHI with unprecedented spatial, angular and spectral resolution. Our results are of great importance for understanding astrophysical jet formation and comparable plasma phenomena by relating the particle motion observed in the KHI to its radiation signature. The innovative methods presented here on the implementation of the particle-in-cell algorithm on graphic processing units can be directly adapted to any many-core parallelization of the particle-mesh method. With these methods we see a peak performance of 7.176 PFLOP/s (double-precision) plus 1.449 PFLOP/s (single-precision), an efficiency of 96% when weakly scaling from 1 to 18432 nodes, an efficiency of 68.92% and a speed up of 794 (ideal: 1152) when strongly scaling from 16 to 18432 nodes.
international workshop on energy efficient supercomputing | 2014
Daniel Hackenberg; Thomas Ilsche; Joseph Schuchart; Robert Schöne; Wolfgang E. Nagel; Marc Simon; Yiannis Georgiou
Accurate and fine-grained power measurements of computing systems are essential for energy-aware performance optimizations of HPC systems and applications. Although cluster wide instrumentation options are available, fine spatial granularity and temporal resolution are not supported by the system vendors and extra hardware is needed to capture the power consumption information. We introduce the High Definition Energy Efficiency Monitoring (HDEEM) infrastructure, a sophisticated approach towards systemwide and fine-grained power measurements that enable energy-aware performance optimizations of parallel codes. Our approach is targeted at instrumenting multiple HPC racks with power sensors that have a sampling rate of about 8 kSa/s as well as finer spatial granularity, e.g., for per-CPU measurements. We specifically focus on the correctness of power measurement samples and energy consumption calculations based on these power samples. We also discuss scalable and low-overhead or overhead-free options for online and offline (post-mortem) processing of power measurement data.
high performance distributed computing | 2012
Thomas Ilsche; Joseph Schuchart; Jason Cope; Dries Kimpe; Terry Jones; Andreas Knüpfer; Kamil Iskra; Robert B. Ross; Wolfgang E. Nagel; Stephen W. Poole
Event tracing is an important tool for understanding the performance of parallel applications. As concurrency increases in leadership-class computing systems, the quantity of performance log data can overload the parallel file system, perturbing the application being observed. In this work we present a solution for event tracing at leadership scales. We enhance the I/O forwarding system software to aggregate and reorganize log data prior to writing to the storage system, significantly reducing the burden on the underlying file system for this type of traffic. Furthermore, we augment the I/O forwarding system with a write buffering capability to limit the impact of artificial perturbations from log data accesses on traced applications. To validate the approach, we modify the Vampir tracing toolset to take advantage of this new capability and show that the approach increases the maximum traced application size by a factor of 5x to more than 200,000 processes.
international green and sustainable computing conference | 2015
Thomas Ilsche; Daniel Hackenberg; Stefan Graul; Robert Schöne; Joseph Schuchart
Energy efficiency is a key optimization goal for software and hardware in the High Performance Computing (HPC) domain. This necessitates sophisticated power measurement capabilities that are characterized by the key criteria (i) high sampling rates, (ii) measurement of individual components, (iii) well-defined accuracy, and (iv) high scalability. In this paper, we tackle the first three of these goals and describe the instrumentation of two high-end compute nodes with three different current measurement techniques: (i) Hall effect sensors, (ii) measuring shunts in extension cables and riser cards, and (iii) tapping into the voltage regulators. The resulting measurement data for components such as sockets, PCIe cards, and DRAM DIMMs is digitized at sampling rates from 7 kSa/s up to 500 kSa/s, enabling a fine-grained correlation between power usage and application events. The accuracy of all elements in the measurement infrastructure is studied carefully. Moreover, potential pitfalls in building custom power instrumentation are discussed. We raise the awareness for the properties of power measurements, as disregarding existing inaccuracies can lead to invalid conclusions regarding energy efficiency.
computational science and engineering | 2015
Yury Oleynik; Michael Gerndt; Joseph Schuchart; Per Gunnar Kjeldsberg; Wolfgang E. Nagel
Efficiently utilizing the resources provided on current petascale and future exascale systems will be a challenging task, potentially causing a large amount of underutilized resources and wasted energy. A promising potential to improve efficiency of HPC applications stems from the significant degree of dynamic behavior, e.g., run-time alternation in application resource requirements in HPC workloads. Manually detecting and leveraging this dynamism to improve performance and energy-efficiency is a tedious task that is commonly neglected by developers. However, using an automatic optimization approach, application dynamism can be analyzed at design-time and used to optimize system configurations at run-time. The European Union Horizon 2020 READEX project will develop a tools-aided scenario based auto-tuning methodology to exploit the dynamic behavior of HPC applications to achieve improved energy-efficiency and performance. Driven by a consortium of European experts from academia, HPC resource providers, and industry, the READEX project aims at developing the first of its kind generic framework for split design-time run-time automatic tuning for heterogeneous system at the exascale level.
Archive | 2015
Thomas Ilsche; Joseph Schuchart; Robert Schöne; Daniel Hackenberg
Performance analysis is vital for optimizing the execution of high performance computing applications. Today different techniques for gathering, processing, and analyzing application performance data exist. Application level instrumentation for example is a powerful method that provides detailed insight into an application’s behavior. However, it is difficult to predict the instrumentation-induced perturbation as it largely depends on the application and its input data. Thus, sampling is a viable alternative to instrumentation for gathering information about the execution of an application by recording its state at regular intervals. This method provides a statistical overview of the application execution and its overhead is more predictable than with instrumentation. Taking into account the specifics of these techniques, this paper makes the following contributions: (I) A comprehensive overview of existing techniques for application performance analysis. (II) A novel tracing approach that combines instrumentation and sampling to offer the benefits of complete information where needed with reduced perturbation. We provide examples using selected instrumentation and sampling methods to detail the advantage of such mixed information and discuss arising challenges and prospects of this approach.
international conference on conceptual structures | 2014
Dali Wang; Joseph Schuchart; Tomislav Janjusic; Frank Winkler; Yang Xu; Christos Kartsaklis
Abstract One key factor in the improved understanding of earth system science is the development and improvement of high fidelity models. Along with the deeper understanding of biogeophysical and biogeochemical processes, the software complexity of those earth system models becomes a barrier for further rapid model improvements and validation. In this paper, we present our experience on better understanding the Community Land Model (CLM) within an earth system modelling framework. First, we give an overview of the software system of the global offline CLM simulation. Second, we present our approach to better understand the CLM software structure and data structure using advanced software tools. After that, we focus on the practical issues related to CLM computational performance and individual ecosystem function. Since better software engineering practices are much needed for general scientific software systems, we hope those considerations can be beneficial to many other modeling research programs involving multiscale system dynamics.
Computing | 2017
Joseph Schuchart; Michael Gerndt; Per Gunnar Kjeldsberg; Michael Lysaght; David Horák; Lubomír Říha; Andreas Gocht; Mohammed Sourouri; Madhura Kumaraswamy; Anamika Chowdhury; Magnus Jahre; Kai Diethelm; Othman Bouizi; Umbreen Sabir Mian; Jakub Kružík; Radim Sojka; Martin Beseda; Venkatesh Kannan; Zakaria Bendifallah; Daniel Hackenberg; Wolfgang E. Nagel
Energy efficiency is an important aspect of future exascale systems, mainly due to rising energy cost. Although High performance computing (HPC) applications are compute centric, they still exhibit varying computational characteristics in different regions of the program, such as compute-, memory-, and I/O-bound code regions. Some of today’s clusters already offer mechanisms to adjust the system to the resource requirements of an application, e.g., by controlling the CPU frequency. However, manually tuning for improved energy efficiency is a tedious and painstaking task that is often neglected by application developers. The European Union’s Horizon 2020 project READEX (Runtime Exploitation of Application Dynamism for Energy-efficient eXascale computing) aims at developing a tools-aided approach for improved energy efficiency of current and future HPC applications. To reach this goal, the READEX project combines technologies from two ends of the compute spectrum, embedded systems and HPC, constituting a split design-time/runtime methodology. From the HPC domain, the Periscope Tuning Framework (PTF) is extended to perform dynamic auto-tuning of fine-grained application regions using the systems scenario methodology, which was originally developed for improving the energy efficiency in embedded systems. This paper introduces the concepts of the READEX project, its envisioned implementation, and preliminary results that demonstrate the feasibility of this approach.
Cluster Computing | 2014
Thomas Ilsche; Joseph Schuchart; Jason Cope; Dries Kimpe; Terry Jones; Andreas Knüpfer; Kamil Iskra; Robert B. Ross; Wolfgang E. Nagel; Stephen W. Poole
Programming development tools are a vital component for understanding the behavior of parallel applications. Event tracing is a principal ingredient to these tools, but new and serious challenges place event tracing at risk on extreme-scale machines. As the quantity of captured events increases with concurrency, the additional data can overload the parallel file system and perturb the application being observed. In this work we present a solution for event tracing on extreme-scale machines. We enhance an I/O forwarding software layer to aggregate and reorganize log data prior to writing to the storage system, significantly reducing the burden on the underlying file system. Furthermore, we introduce a sophisticated write buffering capability to limit the impact. To validate the approach, we employ the Vampir tracing toolset using these new capabilities. Our results demonstrate that the approach increases the maximum traced application size by a factor of 5× to more than 200,000 processes.