Is this you? Create Your Porfile

Guido Juckeland

Helmholtz-Zentrum Dresden-Rossendorf

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Guido Juckeland is active.

Explore More

Publication

Featured researches published by Guido Juckeland.

international conference on parallel processing | 2011

Parallel Performance Measurement of Heterogeneous Parallel Systems with GPUs

Allen D. Malony; Scott Biersdorff; Sameer Shende; Heike Jagode; Stanimire Tomov; Guido Juckeland; Robert Dietrich; Duncan Poole; Christopher Lamb

The power of GPUs is giving rise to heterogeneous parallel computing, with new demands on programming environments, runtime systems, and tools to deliver high-performing applications. This paper studies the problems associated with performance measurement of heterogeneous machines with GPUs. A heterogeneous computation model and alternative host-GPU measurement approaches are discussed to set the stage for reporting new capabilities for heterogeneous parallel performance measurement in three leading HPC tools: PAPI, Vampir, and the TAU Performance System. Our work leverages the new CUPTI tool support in NVIDIAs CUDA device library. Heterogeneous benchmarks from the SHOC suite are used to demonstrate the measurement methods and tool support.

ieee international conference on high performance computing data and analytics | 2013

Radiative signatures of the relativistic Kelvin-Helmholtz instability

M. Bussmann; Heiko Burau; T. E. Cowan; Alexander Debus; Axel Huebl; Guido Juckeland; T. Kluge; Wolfgang E. Nagel; Richard Pausch; Felix Schmitt; U. Schramm; Joseph Schuchart; René Widera

We present a particle-in-cell simulation of the relativistic Kelvin-Helmholtz Instability (KHI) that for the first time delivers angularly resolved radiation spectra of the particle dynamics during the formation of the KHI. This enables studying the formation of the KHI with unprecedented spatial, angular and spectral resolution. Our results are of great importance for understanding astrophysical jet formation and comparable plasma phenomena by relating the particle motion observed in the KHI to its radiation signature. The innovative methods presented here on the implementation of the particle-in-cell algorithm on graphic processing units can be directly adapted to any many-core parallelization of the particle-mesh method. With these methods we see a peak performance of 7.176 PFLOP/s (double-precision) plus 1.449 PFLOP/s (single-precision), an efficiency of 96% when weakly scaling from 1 to 18432 nodes, an efficiency of 68.92% and a speed up of 794 (ideal: 1152) when strongly scaling from 16 to 18432 nodes.

Parallel Tools Workshop | 2010

Comprehensive Performance Tracking with Vampir 7

Holger Brunst; Daniel Hackenberg; Guido Juckeland; Heide Rohling

Vampir 7 is a performance visualization tool that provides a comprehensive view on the runtime behavior of parallel programs. It is a new member of the Vampir tool family. This new generation of performance visualizer combines state-of-the-art parallel data processing techniques with an all-new graphical user interface experience. This includes fast local and remote event data browsing, searching, filtering, clustering, and summarization. The software is ported to Unix, Windows, and Apple platforms. This article gives an overview of the novel techniques and features of Vampir 7.

international conference on parallel processing | 2010

Non-intrusive Performance Analysis of Parallel Hardware Accelerated Applications on Hybrid Architectures

Robert Dietrich; Thomas Ilsche; Guido Juckeland

New high performance computing (HPC) applications recently have to face scalability over an increasing number of nodes and the programming of special accelerator hardware. Hybrid composition of large computing systems leads to a new dimension in complexity of software development. This paper presents a novel approach to gain insight into accelerator interaction and utilization without any changes to the application. It leverages well established methods for performance analysis to accelerator hardware, allowing a holistic view on performance bottlenecks of hybrid applications. A general strategy is presented to get dynamic runtime information about hybrid program execution with minimal impact on the program ???ow. The achievable level of detail is exemplarily studied for the CUDA environment and the OpenCL framework. Combined with existing performance analysis techniques this facilitates obtaining the full potential of hybrid computing power.

parallel computing | 2004

BenchIT — Performance measurement and comparison for scientific applications

Guido Juckeland; Stefan Börner; Michael Kluge; Sebastian Kölling; Wolfgang E. Nagel; Stefan Pflüger; Heike Röding; Stephan Seidl; Thomas William; Robert Wloch

Publisher Summary The BenchIT kernels generate a large amount of measurement results in dependence of the number of functional arguments. Using the web interface, the user is given the chance to show the selected results of different measuring programs in only one coordinate system. Often there are different reasons they can cause characteristic minima, maxima, or a special shape in a graph. It is necessary to collect additional information about the tested system to explain such effects on a base of well-known system properties and physical values of the realization. The BenchIT-project provides such an evaluation platform by offering a variety of measurement kernels, as well as a easily accessible plotting engine, thus enabling an easy way to measure performance on a specific system and compare the result, which is a full graph instead of just a number, to other results contributed by other users. The further development of the BenchIT-project will take place on all module layers. A GUI for the configuration of the measurements is under development. It will provide an easier way to handle the measurements by partially substituting the shell scripts running the measurements up to this point. Furthermore, an additional way to plot the data on the website by using Java-Applets and Java graphing tools is planned.

ieee international conference on high performance computing data and analytics | 2014

SPEC ACCEL : a Standard Application Suite for Measuring Hardware Accelerator Performance

Guido Juckeland; William C. Brantley; Sunita Chandrasekaran; Barbara M. Chapman; Shuai Che; Mathew E. Colgrove; Huiyu Feng; Alexander Grund; Robert Henschel; Wen-mei W. Hwu; Huian Li; Matthias S. Müller; Wolfgang E. Nagel; Maxim Perminov; Pavel Shelepugin; Kevin Skadron; John A. Stratton; Alexey Titov; Ke Wang; G. Matthijs van Waveren; Brian Whitney; Sandra Wienke; Rengan Xu; Kalyan Kumaran

Hybrid nodes with hardware accelerators are becoming very common in systems today. Users often find it difficult to characterize and understand the performance advantage of such accelerators for their applications. The SPEC High Performance Group (HPG) has developed a set of performance metrics to evaluate the performance and power consumption of accelerators for various science applications. The new benchmark comprises two suites of applications written in OpenCL and OpenACC and measures the performance of accelerators with respect to a reference platform. The first set of published results demonstrate the viability and relevance of the new metrics in comparing accelerator performance. This paper discusses the benchmark suites and selected published results in great detail.

international parallel and distributed processing symposium | 2016

Alpaka -- An Abstraction Library for Parallel Kernel Acceleration

Erik Zenker; Benjamin Worpitz; René Widera; Axel Huebl; Guido Juckeland; Andreas Knüpfer; Wolfgang E. Nagel; M. Bussmann

Porting applications to new hardware or programming models is a tedious and error prone process. Every help that eases these burdens is saving developer time that can then be invested into the advancement of the application itself instead of preserving the status-quo on a new platform. The Alpaka library defines and implements an abstract hierarchical redundant parallelism model. The model exploits parallelism and memory hierarchies on a node at all levels available in current hardware. By doing so, it allows to achieve platform and performance portability across various types of accelerators by ignoring specific unsupported levels and utilizing only the ones supported on a specific accelerator. All hardware types (multi-and many-core CPUs, GPUs and other accelerators) are supported for and can be programmed in the same way. The Alpaka C++ template interface allows for straightforward extension of the library to support other accelerators and specialization of its internals for optimization. Running Alpaka applications on a new (and supported) platform requires the change of only one source code line instead of a lot of #ifdefs.

Concurrency and Computation: Practice and Experience | 2012

Performance analysis of multi‐level parallelism: inter‐node, intra‐node and hardware accelerators

Daniel Hackenberg; Guido Juckeland; Holger Brunst

The advent of multi‐core processors has made parallel computing techniques mandatory on mainstream systems. With the recent rise in hardware accelerators, hybrid parallelism adds yet another dimension of complexity to the process of software development. The inner workings of a parallel program are usually difficult to understand and verify. This paper presents a tool for graphical program flow analysis of hardware accelerated parallel programs. It monitors the hybrid program execution to record and visualize many performance relevant events along the way. Representative real‐world applications written for both IBMs Cell processor and NVIDIAs CUDA API are studied exemplarily. With our combined monitoring and visualization approach for hardware accelerated multi‐core and multi‐node systems we take the next step in tool evolution towards a highly improved level of detail, precision, and completeness. The contents of this paper is of interest to developers of hardware accelerated applications as well as performance tool architects. Copyright

grid computing | 2010

High Resolution Program Flow Visualization of Hardware Accelerated Hybrid Multi-core Applications

Daniel Hackenberg; Guido Juckeland; Holger Brunst

The advent of multi-core processors has made parallel computing techniques mandatory on main stream systems. With the recent rise of hardware accelerators, hybrid parallelism adds yet another dimension of complexity to the process of software development. This article presents a tool for graphical program flow analysis of hardware accelerated parallel programs. It monitors the hybrid program execution to record and visualize many performance relevant events along the way. Representative real-world applications written for both IBM’s Cell processor and NVIDIA’s CUDA API are studied exemplarily. To the best of our knowledge, this approach is the first that visualizes the parallelism in hybrid multi-core systems at the presented level of detail.

quantitative evaluation of systems | 2004

Performance analysis with BenchIT: portable, flexible, easy to use

Guido Juckeland; Michael Kluge; Wolfgang E. Nagel; Stefan Pflüger

Understanding performance of modern system architectures is an always present and challenging task. BenchIT - a new tool to support the collection and presentation of such measurement data - is developed by the Center for High Performance Computing Dresden.

Explore More