Is this you? Create Your Porfile

David Rohr

Frankfurt Institute for Advanced Studies

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where David Rohr is active.

Explore More

Publication

Featured researches published by David Rohr.

Computer Science - Research and Development | 2011

Optimized HPL for AMD GPU and multi-core CPU usage

Matthias Bach; Matthias Kretz; V. Lindenstruth; David Rohr

The installation of the LOEWE-CSC (http://csc.uni-frankfurt.de/csc/?51) supercomputer at the Goethe University in Frankfurt lead to the development of a Linpack which can fully utilize the installed AMD Cypress GPUs. At its core, a fast DGEMM for combined GPU and CPU usage was created. The DGEMM library is tuned to hide all DMA transfer times and thus maximize the GPU load. Axa0work stealing scheduler was implemented to add the remaining CPU resources to the DGEMM. On the GPU, the DGEMM achieves 497xa0GFlop/s (90.9% of the theoretical peak). Combined with the 24-core Magny-Cours CPUs, 623xa0GFlop/s (83.6% of the peak) are achieved.The HPL (http://www.netlib.org/benchmark/hpl/algorithm.html) benchmark was modified to perform well with one MPI-process per node. The modifications include multi-threading, vectorization, use of the GPU DGEMM, cache optimizations, and a new Lookahead algorithm. Axa0Linpack performance of 70% theoretical peak is achieved and this performance scales linearly to hundreds of nodes.

high performance computing and communications | 2014

An Energy-Efficient Multi-GPU Supercomputer

David Rohr; Sebastian Kalcher; Matthias Bach; Abdulqadir A. Alaqeeliy; Hani M. Alzaidy; Dominic Eschweiler; Volker Lindenstruth; Sakhar B. Alkhereyfy; Ahmad Alharthiy; Abdulelah Almubaraky; Ibraheem Alqwaizy; Riman Bin Suliman

During recent years, heterogeneous HPC systems, which combine commodity processors with GPUs have proven to deliver superior energy efficiency. In this paper an international collaboration of research groups from Germany and Saudi Arabia presents SANAM, the prototype of a general-purpose 10 PFLOPS supercomputer based on off-the-shelf components. Leveraging an advanced multi-GPU architecture, supported by particular software optimizations that aim at energy efficiency, the system ranked second in the Green500 list of November 2012 with a power efficiency of 2351 MFLOPS/W.

international symposium on microarchitecture | 2011

Multi-GPU DGEMM and High Performance Linpack on Highly Energy-Efficient Clusters

David Rohr; Matthias Bach; Matthias Kretz; V. Lindenstruth

High Performance Linpack can maximize requirements throughout a computer system. An efficient multi-GPU double-precision general matrix multiply (DGEMM), together with adjustments to the HPL, is required to utilize a heterogeneous computer to its full extent. The authors present the resulting energy-efficiency measurements and suggest a cluster design that can utilize multiple GPUs.

Computer Physics Communications | 2017

BioEM: GPU-accelerated computing of Bayesian inference of electron microscopy images

Pilar Cossio; David Rohr; Fabio Baruffa; Markus Rampp; V. Lindenstruth; Gerhard Hummer

Abstract In cryo-electron microscopy (EM), molecular structures are determined from large numbers of projection images of individual particles. To harness the full power of this single-molecule information, we use the Bayesian inference of EM (BioEM) formalism. By ranking structural models using posterior probabilities calculated for individual images, BioEM in principle addresses the challenge of working with highly dynamic or heterogeneous systems not easily handled in traditional EM reconstruction. However, the calculation of these posteriors for large numbers of particles and models is computationally demanding. Here we present highly parallelized, GPU-accelerated computer software that performs this task efficiently. Our flexible formulation employs CUDA, OpenMP, and MPI parallelization combined with both CPU and GPU computing. The resulting BioEM software scales nearly ideally both on pure CPU and on CPU+GPU architectures, thus enabling Bayesian analysis of tens of thousands of images in a reasonable time. The general mathematical framework and robust algorithms are not limited to cryo-electron microscopy but can be generalized for electron tomography and other imaging experiments. Program summary Program Title: BioEM. Program Files doi: http://dx.doi.org/10.17632/d2jjs2wdhv.1 Licensing provisions: GNU GPL v3. Programming language: C++, CUDA. Nature of problem: Analysis of electron microscopy images. Solution method: GPU-accelerated Bayesian inference with numerical grid sampling. External routines/libraries: Boost 1.5, FFTW 3, MPI.

Journal of Physics: Conference Series | 2015

The ALICE High Level Trigger: status and plans

Mikolaj Krzewicki; David Rohr; S. Gorbunov; T. Breitner; Johannes Lehrbach; Volker Lindenstruth; Dario Berzano

The ALICE High Level Trigger (HLT) is an online reconstruction, triggering and data compression system used in the ALICE experiment at CERN. Unique among the LHC experiments, it extensively uses modern coprocessor technologies like general purpose graphic processing units (GPGPU) and field programmable gate arrays (FPGA) in the data flow. Realtime data compression is performed using a cluster finder algorithm implemented on FPGA boards. These data, instead of raw clusters, are used in the subsequent processing and storage, resulting in a compression factor of around 4. Track finding is performed using a cellular automaton and a Kalman filter algorithm on GPGPU hardware, where both CUDA and OpenCL technologies can be used interchangeably. The ALICE upgrade requires further development of online concepts to include detector calibration and stronger data compression. The current HLT farm will be used as a test bed for online calibration and both synchronous and asynchronous processing frameworks already before the upgrade, during Run 2. For opportunistic use as a Grid computing site during periods of inactivity of the experiment a virtualisation based setup is deployed.

Journal of Physics: Conference Series | 2015

Fast TPC Online Tracking on GPUs and Asynchronous Data Processing in the ALICE HLT to facilitate Online Calibration

David Rohr; S. Gorbunov; Mikolaj Krzewicki; T. Breitner; M. Kretz; Volker Lindenstruth

ALICE (A Large Heavy Ion Experiment) is one of the four major experiments at the Large Hadron Collider (LHC) at CERN, which is today the most powerful particle accelerator worldwide. The High Level Trigger (HLT) is an online compute farm of about 200 nodes, which reconstructs events measured by the ALICE detector in real-time. The HLT uses a custom online data-transport framework to distribute data and workload among the compute nodes.ALICE employs several calibration-sensitive subdetectors, e.g. the TPC (Time Projection Chamber). For a precise reconstruction, the HLT has to perform the calibration online. Online- calibration can make certain Offline calibration steps obsolete and can thus speed up Offline analysis. Looking forward to ALICE Run III starting in 2020, online calibration becomes a necessity.The main detector used for track reconstruction is the TPC. Reconstructing the trajectories in the TPC is the most compute-intense step during event reconstruction. Therefore, a fast tracking implementation is of great importance. Reconstructed TPC tracks build the basis for the calibration making a fast online-tracking mandatory.We present several components developed for the ALICE High Level Trigger to perform fast event reconstruction and to provide features required for online calibration.As first topic, we present our TPC tracker, which employs GPUs to speed up the processing, and which bases on a Cellular Automaton and on the Kalman filter. Our TPC tracking algorithm has been successfully used in 2011 and 2012 in the lead-lead and the proton-lead runs. We have improved it to leverage features of newer GPUs and we have ported it to support OpenCL, CUDA, and CPUs with a single common source code. This makes us vendor independent.As second topic, we present framework extensions required for online calibration. The extensions, however, are generic and can be used for other purposes as well. We have extended the framework to support asynchronous compute chains, which are required for long-running tasks required e.g. for online calibration. And we describe our method to feed in custom data sources in the data flow. These can be external parameters like environmental temperature required for calibration and these can also be used to feed back calibration results into the processing chain.Overall, the work presented in this contribution makes the ALICE HLT ready for online reconstruction and calibration for the LHC Run II starting in 2015.

2012 13th International Workshop on Cellular Nanoscale Networks and their Applications | 2012

ALICE TPC online tracker on GPUs for heavy-ion events

David Rohr

The online event reconstruction for the ALICE experiment at CERN requires processing capabilities to process central Pb-Pb collisions at a rate of more than 200 Hz, corresponding to an input data rate of about 25 GB/s. The reconstruction of particle trajectories in the Time Projection Chamber (TPC) is the most compute intensive step. The TPC online tracker implementation combines the principle of the cellular automaton and the Kalman filter. It has been accelerated by the usage of graphics cards (GPUs). A pipelined processing allows to perform the tracking on the GPU, the data transfer, and the preprocessing on the CPU in parallel. In order to use data locality, the tracking is split in multiple phases. At first, track segments are searched in local sectors of the detector, independently and in parallel. These segments are then merged at a global level. A shortcoming of this approach is that if a track contains only a very short segment in one particular sector, the local search possibly does not find this short part. The fast GPU processing allowed to add an additional step: all found tracks are extrapolated to neighboring sectors and the unassigned clusters which constitute the missing track segment are collected. For running QA, it is important that the output of the CPU and the GPU tracker is as consistent as possible. One major challenge was to implement the tracker such that the output is not affected by concurrency, while maintaining peak performance and efficiency. For instance, a naive implementation depended on the order of the tracks which is nondeterministic when they are created in parallel. Still, due to non-associative floating point arithmetic a direct binary comparison of the CPU and the GPU tracker output is impossible. Thus, the approach chosen for evaluating the GPU tracker efficiency is to compare the cluster to track assignment of the CPU and the GPU tracker cluster by cluster. With the above comparison scheme, the output of the CPU and the GPU tracker differ by 0.00024Compared to the offline tracker, the HLT tracker is orders of magnitudes faster while delivering good results. The GPU version outperforms its CPU analog by another factor of three. Recently, the ALICE HLT cluster was upgraded with new GPUs and is able to process central heavy ion events at a rate of approximately 200 Hz.

EPJ Web of Conferences | 2016

Online Reconstruction and Calibration with Feedback Loop in the ALICE High Level Trigger

David Rohr; R. Shahoyan; C. Zampolli; Mikolaj Krzewicki; J. Wiechula; S. Gorbunov; A. Chauvin; K. Schweda; Volker Lindenstruth

ALICE (A Large Heavy Ion Experiment) is one of the four large scale experiments at the Large Hadron Collider (LHC) at CERN. The High Level Trigger (HLT) is an online computing farm, which reconstructs events recorded by the ALICE detector in real-time. The most compute-intense task is the reconstruction of the particle trajectories. The main tracking devices in ALICE are the Time Projection Chamber (TPC) and the Inner Tracking System (ITS). The HLT uses a fast GPU-accelerated algorithm for the TPC tracking based on the Cellular Automaton principle and the Kalman filter. ALICE employs gaseous subdetectors which are sensitive to environmental conditions such as ambient pressure and temperature and the TPC is one of these. A precise reconstruction of particle trajectories requires the calibration of these detectors. As first topic, we present some recent optimizations to our GPU-based TPC tracking using the new GPU models we employ for the ongoing and upcoming data taking period at LHC. We also show our new approach for fast ITS standalone tracking. As second topic, we present improvements to the HLT for facilitating online reconstruction including a new flat data model and a new data flow chain. The calibration output is fed back to the reconstruction components of the HLT via a feedback loop. We conclude with an analysis of a first online calibration test under real conditions during the Pb-Pb run in November 2015, which was based on these new features.

parallel, distributed and network-based processing | 2015

A Flexible and Portable Large-Scale DGEMM Library for Linpack on Next-Generation Multi-GPU Systems

David Rohr; V. Lindenstruth

In recent years, high performance computing has benefitted greatly from special accelerator cards such as GPUs. Matrix multiplication performed by the BLAS function DGEMM is one of the prime examples where such accelerators excel. DGEMM is the computational hotspot of many tasks, among them the Linpack benchmark. Current GPUs achieve more than 1 TFLOPS real performance in this task. Being connected via PCI Express, one can easily install multiple GPUs in a single compute node. This enables the construction of multi-TFLOPS systems out of off-the-shelf components. At such high performance, it is often complicated to feed the GPUs with sufficient data to run at full performance. In this paper we first analyze the scalability of our DGEMM implementation for multiple fast GPUs. Then we suggest a new scheme optimized for this situation and we present an implementation.

IEEE Transactions on Nuclear Science | 2017

Online Calibration of the TPC Drift Time in the ALICE High Level Trigger

David Rohr; Mikolaj Krzewicki; C. Zampolli; J. Wiechula; S. Gorbunov; A. Chauvin; I. Vorobyev; Steffen Weber; K. Schweda; V. Lindenstruth

A Large Ion Collider Experiment (ALICE) is one of the four major experiments at the Large Hadron Collider (LHC) at CERN. The high level trigger (HLT) is a compute cluster, which reconstructs collisions as recorded by the ALICE detector in real-time. It employs a custom online data-transport framework to distribute data and workload among the compute nodes. ALICE employs subdetectors that are sensitive to environmental conditions such as pressure and temperature, e.g., the time projection chamber (TPC). A precise reconstruction of particle trajectories requires calibration of these detectors. Performing calibration in real time in the HLT improves the online reconstructions and renders certain offline calibration steps obsolete speeding up offline physics analysis. For LHC Run 3, starting in 2020 when data reduction will rely on reconstructed data, online calibration becomes a necessity. Reconstructed particle trajectories build the basis for the calibration making a fast online-tracking mandatory. The main detectors used for this purpose are the TPC and Inner Tracking System. Reconstructing the trajectories in the TPC is the most compute-intense step. We present several improvements to the ALICE HLT developed to facilitate online calibration. The main new development for online calibration is a wrapper that can run ALICE offline analysis and calibration tasks inside the HLT. In addition, we have added asynchronous processing capabilities to support long-running calibration tasks in the HLT framework, which runs event-synchronously otherwise. In order to improve the resiliency, an isolated process performs the asynchronous operations such that even a fatal error does not disturb data taking. We have complemented the original loop-free HLT chain with ZeroMQ data-transfer components. The ZeroMQ components facilitate a feedback loop that inserts the calibration result created at the end of the chain back into tracking components at the beginning of the chain, after a short delay. All these new features are implemented in a general way, such that they have use-cases aside from online calibration. In order to gather sufficient statistics for the calibration, the asynchronous calibration component must process enough events per time interval. Since the calibration is valid only for a certain time period, the delay until the feedback loop provides updated calibration data must not be too long. A first full-scale test of the online calibration functionality was performed during 2015 heavy-ion run under real conditions. Since then, online calibration is enabled and benchmarked in 2016 proton-proton data taking. We present a timing analysis of this first online-calibration test, which concludes that the HLT is capable of online TPC drift time calibration fast enough to calibrate the tracking via the feedback loop. We compare the calibration results with the offline calibration and present a comparison of the residuals of the TPC cluster coordinates with respect to offline reconstruction.

Explore More