Is this you? Create Your Porfile

René Widera

Helmholtz-Zentrum Dresden-Rossendorf

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where René Widera is active.

Explore More

Publication

Featured researches published by René Widera.

IEEE Transactions on Plasma Science | 2010

PIConGPU: A Fully Relativistic Particle-in-Cell Code for a GPU Cluster

H. Burau; René Widera; W Hönig; G Juckeland; Alexander Debus; T. Kluge; U. Schramm; T. E. Cowan; R. Sauerbrey; M. Bussmann

The particle-in-cell (PIC) algorithm is one of the most widely used algorithms in computational plasma physics. With the advent of graphical processing units (GPUs), large-scale plasma simulations on inexpensive GPU clusters are in reach. We present an implementation of a fully relativistic plasma PIC algorithm for GPUs based on the NVIDIA CUDA library. It supports a hybrid architecture consisting of single computation nodes interconnected in a standard cluster topology, with each node carrying one or more GPUs. The internode communication is realized using the message-passing interface. The simulation code PIConGPU presented in this paper is, to our knowledge, the first scalable GPU cluster implementation of the PIC algorithm in plasma physics.

ieee international conference on high performance computing data and analytics | 2013

Radiative signatures of the relativistic Kelvin-Helmholtz instability

M. Bussmann; Heiko Burau; T. E. Cowan; Alexander Debus; Axel Huebl; Guido Juckeland; T. Kluge; Wolfgang E. Nagel; Richard Pausch; Felix Schmitt; U. Schramm; Joseph Schuchart; René Widera

We present a particle-in-cell simulation of the relativistic Kelvin-Helmholtz Instability (KHI) that for the first time delivers angularly resolved radiation spectra of the particle dynamics during the formation of the KHI. This enables studying the formation of the KHI with unprecedented spatial, angular and spectral resolution. Our results are of great importance for understanding astrophysical jet formation and comparable plasma phenomena by relating the particle motion observed in the KHI to its radiation signature. The innovative methods presented here on the implementation of the particle-in-cell algorithm on graphic processing units can be directly adapted to any many-core parallelization of the particle-mesh method. With these methods we see a peak performance of 7.176 PFLOP/s (double-precision) plus 1.449 PFLOP/s (single-precision), an efficiency of 96% when weakly scaling from 1 to 18432 nodes, an efficiency of 68.92% and a speed up of 794 (ideal: 1152) when strongly scaling from 16 to 18432 nodes.

international parallel and distributed processing symposium | 2016

Alpaka -- An Abstraction Library for Parallel Kernel Acceleration

Erik Zenker; Benjamin Worpitz; René Widera; Axel Huebl; Guido Juckeland; Andreas Knüpfer; Wolfgang E. Nagel; M. Bussmann

Porting applications to new hardware or programming models is a tedious and error prone process. Every help that eases these burdens is saving developer time that can then be invested into the advancement of the application itself instead of preserving the status-quo on a new platform. The Alpaka library defines and implements an abstract hierarchical redundant parallelism model. The model exploits parallelism and memory hierarchies on a node at all levels available in current hardware. By doing so, it allows to achieve platform and performance portability across various types of accelerators by ignoring specific unsupported levels and utilizing only the ones supported on a specific accelerator. All hardware types (multi-and many-core CPUs, GPUs and other accelerators) are supported for and can be programmed in the same way. The Alpaka C++ template interface allows for straightforward extension of the library to support other accelerators and specialization of its internals for optimization. Running Alpaka applications on a new (and supported) platform requires the change of only one source code line instead of a lot of #ifdefs.

ieee international conference on high performance computing, data, and analytics | 2016

Performance-Portable Many-Core Plasma Simulations: Porting PIConGPU to OpenPower and Beyond

Erik Zenker; René Widera; Axel Huebl; Guido Juckeland; Andreas Knüpfer; Wolfgang E. Nagel; M. Bussmann

With the appearance of the heterogeneous platform OpenPower,many-core accelerator devices have been coupled with Power host processors for the first time. Towards utilizing their full potential, it is worth investigating performance portable algorithms that allow to choose the best-fitting hardware for each domain-specific compute task. Suiting even the high level of parallelism on modern GPGPUs, our presented approach relies heavily on abstract meta-programming techniques, which are essential to focus on fine-grained tuning rather than code porting. With this in mind, the CUDA-based open-source plasma simulation code PIConGPU is currently being abstracted to support the heterogeneous OpenPower platform using our fast porting interface cupla, which wraps the abstract parallel C++11 kernel acceleration library Alpaka. We demonstrate how PIConGPU can benefit from the tunable kernel execution strategies of the Alpaka library, achieving portability and performance with single-source kernels on conventional CPUs, Power8 CPUs and NVIDIA GPUs.

international conference on parallel processing | 2012

Phase-Based Profiling in GPGPU Kernels

Robert Dietrich; Felix Schmitt; René Widera; M. Bussmann

More and more computationally intensive scientific applications make use of hardware accelerators like general purpose graphics processing units (GPGPUs). Compared to software development for typical multi-core processors their programming is fairly complex and needs hardware specific optimizations to utilize the full computing power. To achieve high performance, critical parts of a program have to be identified and optimized. This paper proposes an approach for performance analysis of CUDA kernel source code regions, which for the first time allows measuring the execution times within GPGPU kernels. We developed a tool, which implements the presented method and supports the application developer to easily identify hot spots within the kernel. The presented tool uses compile time code analysis to automatically instrument suitable instrumentation points for minimal program perturbation and further provides support for manual instrumentation. To the best of our knowledge this is the first approach, which allows for scalable runtime analysis within GPGPU kernels. Combined with existing performance analysis techniques this facilitates obtaining the full potential of modern parallel systems.

Physical Review E | 2017

Identifying the linear phase of the relativistic Kelvin-Helmholtz instability and measuring its growth rate via radiation

Richard Pausch; M. Bussmann; Axel Huebl; U. Schramm; Klaus Steiniger; René Widera; Alexander Debus

For the relativistic Kelvin-Helmholtz instability (KHI), which occurs at shear interfaces between two plasma streams, we report results on the polarized radiation over all observation directions and frequencies emitted by the plasma electrons from ab initio kinetic simulations. We find the polarization of the radiation to provide a clear signature for distinguishing the linear phase of the KHI from its other phases. During the linear phase, we predict the growth rate of the KHI radiation power to match the growth rate of the KHI to a high degree. Our predictions are based on a model of the vortex dynamics, which describes the electron motion in the vicinity of the shear interface between the two streams. Albeit the complex and turbulent dynamics happening in the shear region, we find excellent agreement between our model and large-scale particle-in-cell simulations. Our findings pave the way for identifying the KHI linear regime and for measuring its growth rate in astrophysical jets observable on earth as well as in laboratory plasmas.

Nuclear Instruments & Methods in Physics Research Section A-accelerators Spectrometers Detectors and Associated Equipment | 2018

Quantitatively consistent computation of coherent and incoherent radiation in particle-in-cell codes—A general form factor formalism for macro-particles

Richard Pausch; Alexander Debus; Axel Huebl; U. Schramm; Klaus Steiniger; René Widera; M. Bussmann

Abstract Quantitative predictions from synthetic radiation diagnostics often have to consider all accelerated particles. For particle-in-cell (PIC) codes, this not only means including all macro-particles but also taking into account the discrete electron distribution associated with them. This paper presents a general form factor formalism that allows to determine the radiation from this discrete electron distribution in order to compute the coherent and incoherent radiation self-consistently. Furthermore, we discuss a memory-efficient implementation that allows PIC simulations with billions of macro-particles. The impact on the radiation spectra is demonstrated on a large scale LWFA simulation.

ieee international conference on high performance computing, data, and analytics | 2017

Tuning and Optimization for a Variety of Many-Core Architectures Without Changing a Single Line of Implementation Code Using the Alpaka Library

Alexander Matthes; René Widera; Erik Zenker; Benjamin Worpitz; Axel Huebl; M. Bussmann

We present an analysis on optimizing performance of a single C++11 source code using the Alpaka hardware abstraction library. For this we use the general matrix multiplication (GEMM) algorithm in order to show that compilers can optimize Alpaka code effectively when tuning key parameters of the algorithm. We do not intend to rival existing, highly optimized DGEMM versions, but merely choose this example to prove that Alpaka allows for platform-specific tuning with a single source code. In addition we analyze the optimization potential available with vendor-specific compilers when confronted with the heavily templated abstractions of Alpaka. We specifically test the code for bleeding edge architectures such as Nvidias Tesla P100, Intels Knights Landing (KNL) and Haswell architecture as well as IBMs Power8 system. On some of these we are able to reach almost 50\% of the peak floating point operation performance using the aforementioned means. When adding compiler-specific #pragmas we are able to reach 5 TFLOPS/s on a P100 and over 1 TFLOPS/s on a KNL system.

ieee international conference on high performance computing, data, and analytics | 2017

On the Scalability of Data Reduction Techniques in Current and Upcoming HPC Systems from an Application Perspective

Axel Huebl; René Widera; Felix Schmitt; Alexander Matthes; Norbert Podhorszki; Jong Youl Choi; Scott Klasky; M. Bussmann

We implement and benchmark parallel I/O methods for the fully-manycore driven particle-in-cell code PIConGPU. Identifying throughput and overall I/O size as a major challenge for applications on todays and future HPC systems, we present a scaling law characterizing performance bottlenecks in state-of-the-art approaches for data reduction. Consequently, we propose, implement and verify multi-threaded data-transformations for the I/O library ADIOS as a feasible way to trade underutilized host-side compute potential on heterogeneous systems for reduced I/O latency.

Proceedings of SPIE | 2017

Simulate what is measured: next steps towards predictive simulations (Conference Presentation)

M. Bussmann; T. Kluge; Alexander Debus; Axel Hübl; Marco Garten; Malte Zacharias; Jan Vorberger; Richard Pausch; René Widera; U. Schramm; T. E. Cowan; A. Irman; K. Zeil; Dominik Kraus

Simulations of laser matter interaction at extreme intensities that have predictive power are nowadays in reach when considering codes that make optimum use of high performance compute architectures. Nevertheless, this is mostly true for very specific settings where model parameters are very well known from experiment and the underlying plasma dynamics is governed by Maxwells equations solely. When including atomic effects, prepulse influences, radiation reaction and other physical phenomena things look different. Not only is it harder to evaluate the sensitivity of the simulation result on the variation of the various model parameters but numerical models are less well tested and their combination can lead to subtle side effects that influence the simulation outcome. We propose to make optimum use of future compute hardware to compute statistical and systematic errors rather than just find the mots optimum set of parameters fitting an experiment. This requires to include experimental uncertainties which is a challenge to current state of the art techniques. Moreover, it demands better comparison to experiments as inclusion of simulating the diagnostics response becomes important. We strongly advocate the use of open standards for finding interoperability between codes for comparison studies, building complete tool chains for simulating laser matter experiments from start to end.

Explore More