S Wenzel | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where S Wenzel is active.

Explore More

Publication

Featured researches published by S Wenzel.

Journal of Physics: Conference Series | 2015

Towards a high performance geometry library for particle-detector simulations

J. Apostolakis; M Bandieramonte; G Bitzes; R. Brun; Philippe Canal; F. Carminati; G. Cosmo; J de Fine Licht; L Duhem; V D Elvira; A. Gheata; Soon Yung Jun; G Lima; T Nikitina; M Novak; R Sehgal; O Shadura; S Wenzel

Thread-parallelization and single-instruction multiple data (SIMD) ”vectorisation” of software components in HEP computing has become a necessity to fully benefit from current and future computing hardware. In this context, the Geant-Vector/GPU simulation project aims to re-engineer current software for the simulation of the passage of particles through detectors in order to increase the overall event throughput. As one of the core modules in this area, the geometry library plays a central role and vectorising its algorithms will be one of the cornerstones towards achieving good CPU performance. Here, we report on the progress made in vectorising the shape primitives, as well as in applying new C++ template based optimizations of existing code available in the Geant4, ROOT or USolids geometry libraries. We will focus on a presentation of our software development approach that aims to provide optimized code for all use cases of the library (e.g., single particle and many-particle APIs) and to support different architectures (CPU and GPU) while keeping the code base small, manageable and maintainable. We report on a generic and templated C++ geometry library as a continuation of the AIDA USolids project. As a result, the experience gained with these developments will be beneficial to othermorexa0» parts of the simulation software, such as for the optimization of the physics library, and possibly to other parts of the experiment software stack, such as reconstruction and analysis.«xa0less

Journal of Physics: Conference Series | 2014

A concurrent vector-based steering framework for particle transport

J. Apostolakis; F. Carminati; S Wenzel; Ren Brun; A. Gheata

High Energy Physics has traditionally been a technology-limited science that has pushed the boundaries of both the detectors collecting the information about the particles and the computing infrastructure processing this information. However, since a few years the increase in computing power comes in the form of increased parallelism at all levels, and High Energy Physics has now to optimise its code to take advantage of the new architectures, including GPUs and hybrid systems. One of the primary targets for optimisation is the particle transport code used to simulate the detector response, as it is largely experiment independent and one of the most demanding applications in terms of CPU resources. The Geant Vector Prototype project aims to explore innovative designs in particle transport aimed at obtaining maximal performance on the new architectures. This paper describes the current status of the project and its future perspectives. In particular we describe how the present design tries to expose the parallelism of the problem at all possible levels, in a design that is aimed at minimising contentions and maximising concurrency, both at the coarse granularity level (threads) and at the micro granularity one (vectorisation, instruction pipelining, multiple instructions per cycle). The future plans and perspectives will also be mentioned.

Journal of Physics: Conference Series | 2015

Adaptive Track Scheduling to Optimize Concurrency and Vectorization in GeantV

J. Apostolakis; M Bandieramonte; G Bitzes; R. Brun; Philippe Canal; F. Carminati; J de Fine Licht; L Duhem; V D Elvira; A. Gheata; Soon Yung Jun; G Lima; M Novak; R Sehgal; O Shadura; S Wenzel

The GeantV project is focused on the R&D of new particle transport techniques to maximize parallelism on multiple levels, profiting from the use of both SIMD instructions and co-processors for the CPU-intensive calculations specific to this type of applications. In our approach, vectors of tracks belonging to multiple events and matching different locality criteria must be gathered and dispatched to algorithms having vector signatures. While the transport propagates tracks and changes their individual states, data locality becomes harder to maintain. The scheduling policy has to be changed to maintain efficient vectors while keeping an optimal level of concurrency. The model has complex dynamics requiring tuning the thresholds to switch between the normal regime and special modes, i.e. prioritizing events to allow flushing memory, adding new events in the transport pipeline to boost locality, dynamically adjusting the particle vector size or switching between vector to single track mode when vectorization causes only overhead. This work requires a comprehensive study for optimizing these parameters to make the behaviour of the scheduler self-adapting, presenting here its initial results.

Journal of Physics: Conference Series | 2014

The path toward HEP High Performance Computing

J. Apostolakis; R. Brun; F. Carminati; A. Gheata; S Wenzel

High Energy Physics code has been known for making poor use of high performance computing architectures. Efforts in optimising HEP code on vector and RISC architectures have yield limited results and recent studies have shown that, on modern architectures, it achieves a performance between 10% and 50% of the peak one. Although several successful attempts have been made to port selected codes on GPUs, no major HEP code suite has a High Performance implementation. With LHC undergoing a major upgrade and a number of challenging experiments on the drawing board, HEP cannot any longer neglect the less-than-optimal performance of its code and it has to try making the best usage of the hardware. This activity is one of the foci of the SFT group at CERN, which hosts, among others, the Root and Geant4 project. The activity of the experiments is shared and coordinated via a Concurrency Forum, where the experience in optimising HEP code is presented and discussed. Another activity is the Geant-V project, centred on the development of a highperformance prototype for particle transport. Achieving a good concurrency level on the emerging parallel architectures without a complete redesign of the framework can only be done by parallelizing at event level, or with a much larger effort at track level. Apart the shareable data structures, this typically implies a multiplication factor in terms of memory consumption compared to the single threaded version, together with sub-optimal handling of event processing tails. Besides this, the low level instruction pipelining of modern processors cannot be used efficiently to speedup the program. We have implemented a framework that allows scheduling vectors of particles to an arbitrary number of computing resources in a fine grain parallel approach. The talk will review the current optimisation activities within the SFT group with a particular emphasis on the development perspectives towards a simulation framework able to profit best from the recent technology evolution in computing.

Journal of Physics: Conference Series | 2016

Electromagnetic physics models for parallel computing architectures

Guilherme Amadio; A Ananya; J. Apostolakis; A Aurora; M Bandieramonte; A Bhattacharyya; C Bianchini; R. Brun; Philippe Canal; F. Carminati; L Duhem; Daniel Elvira; A. Gheata; M. Gheata; I Goulas; R Iope; Soon Yung Jun; G Lima; A Mohanty; T Nikitina; M Novak; Witold Pokorski; A. Ribon; R Seghal; O Shadura; S Vallecorsa; S Wenzel; Yang Zhang

The recent emergence of hardware architectures characterized by many-core or accelerated processors has opened new opportunities for concurrent programming models taking advantage of both SIMD and SIMT architectures. GeantV, a next generation detector simulation, has been designed to exploit both the vector capability of mainstream CPUs and multi-threading capabilities of coprocessors including NVidia GPUs and Intel Xeon Phi. The characteristics of these architectures are very different in terms of the vectorization depth and type of parallelization needed to achieve optimal performance. In this paper we describe implementation of electromagnetic physics models developed for parallel computing architectures as a part of the GeantV project. Results of preliminary performance evaluation and physics validation are presented as well.

arXiv: Computational Physics | 2014

Vectorising the detector geometry to optimise particle transport

J. Apostolakis; R. Brun; F. Carminati; A. Gheata; S Wenzel

Among the components contributing to particle transport, geometry navigation is an important consumer of CPU cycles. The tasks performed to get answers to basic queries such as locating a point within a geometry hierarchy or computing accurately the distance to the next boundary can become very computing intensive for complex detector setups. So far, the existing geometry algorithms employ mainly scalar optimisation strategies (voxelization, caching) to reduce their CPU consumption. In this paper, we would like to take a different approach and investigate how geometry navigation can benefit from the vector instruction set extensions that are one of the primary source of performance enhancements on current and future hardware. While on paper, this form of microparallelism promises increasing performance opportunities, applying this technology to the highly hierarchical and multiply branched geometry code is a difficult challenge. We refer to the current work done to vectorise an important part of the critical navigation algorithms in the ROOT geometry library. Starting from a short critical discussion about the programming model, we present the current status and first benchmark results of the vectorisation of some elementary geometry shape algorithms. On the path towards a full vector-based geometry navigator, we also investigate the performance benefits in connecting these elementary functions together to develop algorithms which are entirely based on the flow of vector-data. To this end, we discuss core components of a simple vector navigator that is tested and evaluated on a toy detector setup.

Journal of Physics: Conference Series | 2016

GeantV: from CPU to accelerators

Guilherme Amadio; A Ananya; J. Apostolakis; A Arora; M Bandieramonte; A Bhattacharyya; C Bianchini; R. Brun; Philippe Canal; F. Carminati; L Duhem; Daniel Elvira; A. Gheata; M. Gheata; I Goulas; R Iope; Soon Yung Jun; G Lima; A Mohanty; T Nikitina; M Novak; Witold Pokorski; A. Ribon; R Sehgal; O Shadura; S Vallecorsa; S Wenzel; Yang Zhang

The GeantV project aims to research and develop the next-generation simulation software describing the passage of particles through matter. While the modern CPU architectures are being targeted first, resources such as GPGPU, Intel© Xeon Phi, Atom or ARM cannot be ignored anymore by HEP CPU-bound applications. The proof of concept GeantV prototype has been mainly engineered for CPUs having vector units but we have foreseen from early stages a bridge to arbitrary accelerators. A software layer consisting of architecture/technology specific backends supports currently this concept. This approach allows to abstract out the basic types such as scalar/vector but also to formalize generic computation kernels using transparently library or device specific constructs based on Vc, CUDA, Cilk+ or Intel intrinsics. While the main goal of this approach is portable performance, as a bonus, it comes with the insulation of the core application and algorithms from the technology layer. This allows our application to be long term maintainable and versatile to changes at the backend side. The paper presents the first results of basket-based GeantV geometry navigation on the Intel© Xeon Phi KNC architecture. We present the scalability and vectorization study, conducted using Intel performance tools, as well as our preliminary conclusions on the use of accelerators for GeantV transport. We also describe the current work and preliminary results for using the GeantV transport kernel on GPUs.

Journal of Physics: Conference Series | 2015

First experience of vectorizing electromagnetic physics models for detector simulation

Guilherme Amadio; J. Apostolakis; M Bandieramonte; C Bianchini; G Bitzes; R. Brun; Philippe Canal; F. Carminati; J de Fine Licht; L Duhem; Daniel Elvira; A. Gheata; Soon Yung Jun; G Lima; M Novak; M Presbyterian; O Shadura; R Seghal; S Wenzel

The recent emergence of hardware architectures characterized by many-core or accelerated processors has opened new opportunities for concurrent programming models taking advantage of both SIMD and SIMT architectures. The GeantV vector prototype for detector simulations has been designed to exploit both the vector capability of mainstream CPUs and multi-threading capabilities of coprocessors including NVidia GPUs and Intel Xeon Phi. The characteristics of these architectures are very different in terms of the vectorization depth, parallelization needed to achieve optimal performance or memory access latency and speed. An additional challenge is to avoid the code duplication often inherent to supporting heterogeneous platforms. In this paper we present the first experience of vectorizing electromagnetic physics models developed for the GeantV project.

Journal of Physics: Conference Series | 2015

The GeantV project: Preparing the future of simulation

Guilherme Amadio; J. Apostolakis; M Bandieramonte; A Bhattacharyya; C Bianchini; R. Brun; Ph Canal; F. Carminati; L Duhem; Daniel Elvira; J de Fine Licht; A. Gheata; R Iope; G Lima; A Mohanty; T Nikitina; M Novak; Witold Pokorski; R Seghal; O Shadura; S Vallecorsa; S Wenzel

Detector simulation is consuming at least half of the HEP computing cycles, and even so, experiments have to take hard decisions on what to simulate, as their needs greatly surpass the availability of computing resources. New experiments still in the design phase such as FCC, CLIC and ILC as well as upgraded versions of the existing LHC detectors will push further the simulation requirements. Since the increase in computing resources is not likely to keep pace with our needs, it is therefore necessary to explore innovative ways of speeding up simulation in order to sustain the progress of High Energy Physics. The GeantV project aims at developing a high performance detector simulation system integrating fast and full simulation that can be ported on different computing architectures, including CPU accelerators. After more than two years of R&D the project has produced a prototype capable of transporting particles in complex geometries exploiting micro-parallelism, SIMD and multithreading. Portability is obtained via C++ template techniques that allow the development of machine- independent computational kernels. Furthermore, a set of tables derived from Geant4 for cross sections and final states provides a realistic shower development and, having been ported into a Geant4 physics list, can bemorexa0» used as a basis for a direct performance comparison.«xa0less

Journal of Physics: Conference Series | 2017

Stochastic optimization of GeantV code by use of genetic algorithms

Guilherme Amadio; J. Apostolakis; M Bandieramonte; S.P. Behera; R. Brun; Philippe Canal; F. Carminati; G. Cosmo; L Duhem; Daniel Elvira; G. Folger; A. Gheata; M. Gheata; I Goulas; F. Hariri; Soon Yung Jun; D. Konstantinov; H. Kumawat; V. Ivantchenko; G Lima; T Nikitina; M Novak; Witold Pokorski; A. Ribon; R Seghal; O Shadura; S. Vallecorsa; S Wenzel

GeantV is a complex system based on the interaction of different modules needed for detector simulation, which include transport of particles in fields, physics models simulating their interactions with matter and a geometrical modeler library for describing the detector and locating the particles and computing the path length to the current volume boundary. The GeantV project is recasting the classical simulation approach to get maximum benefit from SIMD/MIMD computational architectures and highly massive parallel systems. This involves finding the appropriate balance between several aspects influencing computational performance (floating-point performance, usage of off-chip memory bandwidth, specification of cache hierarchy, etc.) and handling a large number of program parameters that have to be optimized to achieve the best simulation throughput. This optimization task can be treated as a black-box optimization problem, which requires searching the optimum set of parameters using only point-wise function evaluations. The goal of this study is to provide a mechanism for optimizing complex systems (high energy physics particle transport simulations) with the help of genetic algorithms and evolution strategies as tuning procedures for massive parallel simulations. One of the described approaches is based on introducing a specific multivariate analysis operator that could be used in case of resource expensive or time consuming evaluations of fitness functions, in order to speed-up the convergence of the black-box optimization problem.

Explore More