Sverre Jarp
CERN
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Sverre Jarp.
Journal of Physics: Conference Series | 2008
Sverre Jarp; Ryszard Erazm Jurga; Andrzej Nowak
This paper describes the software component, perfmon2, that is about to be added to the Linux kernel as the standard interface to the Performance Monitoring Unit (PMU) on common processors, including x86 (AMD and Intel), Sun SPARC, MIPS, IBM Power and Intel Itanium. It also describes a set of tools for doing performance monitoring in practice and details how the CERN openlab team has participated in the testing and development of these tools.
Journal of Physics: Conference Series | 2012
Sverre Jarp; A. Lazzaro; Andrzej Nowak
As the mainstream computing world has shifted from multi-core to many-core platforms, the situation for software developers has changed as well. With the numerous hardware and software options available, choices balancing programmability and performance are becoming a significant challenge. The expanding multiplicative dimensions of performance offer a growing number of possibilities that need to be assessed and addressed on several levels of abstraction. This paper reviews the major trade-offs forced upon the software domain by the changing landscape of parallel technologies – hardware and software alike. Recent developments, paradigms and techniques are considered with respect to their impact on the rather traditional HEP programming models. Other considerations addressed include aspects of efficiency and reasonably achievable targets for the parallelization of large scale HEP workloads.
ACM Computing Surveys | 2013
Xavier Grehant; Isabelle M. Demeure; Sverre Jarp
Grids designed for computationally demanding scientific applications started experimental phases ten years ago and have been continuously delivering computing power to a wide range of applications for more than half of this time. The observation of their emergence and evolution reveals actual constraints and successful approaches to task mapping across administrative boundaries. Beyond differences in distributions, services, protocols, and standards, a common architecture is outlined. Application-agnostic infrastructures built for resource registration, identification, and access control dispatch delegation to grid sites. Efficient task mapping is managed by large, autonomous applications or collaborations that temporarily infiltrate resources for their own benefits.
Journal of Physics: Conference Series | 2011
Sverre Jarp; A. Lazzaro; Julien Leduc; Andrzej Nowak; F. Pantaleo
Data analyses based on maximum likelihood fits are commonly used in the high energy physics community for fitting statistical models to data samples. This technique requires the numerical minimization of the negative log-likelihood function. MINUIT is the most common package used for this purpose in the high energy physics community. The main algorithm in this package, MIGRAD, searches the minimum by using the gradient information. The procedure requires several evaluations of the function, depending on the number of free parameters and their initial values. The whole procedure can be very CPU-time consuming in case of complex functions, with several free parameters, many independent variables and large data samples. Therefore, it becomes particularly important to speed-up the evaluation of the negative log-likelihood function. In this paper we present an algorithm and its implementation which benefits from data vectorization and parallelization (based on OpenMP) and which was also ported to Graphics Processing Units using CUDA.
Journal of Physics: Conference Series | 2012
Xin Dong; Gene Cooperman; J. Apostolakis; Sverre Jarp; Andrzej Nowak; Makoto Asai; Daniel Brandt
We document the methods used to create the multi-threaded prototype Geant4MT from a sequential version of Geant4. We cover the Source-to-Source transformations applied, and discuss the process of verifying the correctness of the Geant4MT toolkit and applications based on it. Tools to ensure that the results of a transformed multi-threaded application are exactly equal to the original sequential version are under development. Stand-alone or simple applications can be adapted within 1–2 working days. Geant4MT is shown to scale linearly on an 80-core computer. In the special case of a single worker thread on one core, 30% overhead has been observed. We explain the reasons for this and the improvements introduced to reduce this overhead.
ieee international symposium on parallel & distributed processing, workshops and phd forum | 2011
Sverre Jarp; A. Lazzaro; Julien Leduc; Andrzej Nowak; F. Pantaleo
Data analysis techniques based on likelihood function calculation play a crucial role in many High Energy Physics measurements. Depending on the complexity of the models used in the analyses, with several free parameters, many independent variables, large data samples, and complex functions, the calculation of the likelihood functions can require a long CPU execution time. In the past, the continuous gain in performance for each single CPU core kept pace with the increase on the complexity of the analyses, maintaining reasonable the execution time of the sequential software applications. Nowadays, the performance for single cores is not increasing as in the past, while the complexity of the analyses has grown significantly in the Large Hadron Collider era. In this context a breakthrough is represented by the increase of the number of computational cores per computational node. This allows to speed up the execution of the applications, redesigning them with parallelization paradigms. The likelihood function evaluation can be parallelized using data and task parallelism, which are suitable for CPUs and GPUs (Graphics Processing Units), respectively. In this paper we show how the likelihood function evaluation has been parallelized on GPUs. We describe the implemented algorithm and we give some performance results when running typical models used in High Energy Physics measurements. In our implementation we achieve a good scaling with respect to the number of events of the data samples.
Journal of Physics: Conference Series | 2012
Sverre Jarp; A. Lazzaro; Julien Leduc; Andrzej Nowak; Yngve Sneen Lindal
We describe parallel implementations of an algorithm used to evaluate the likelihood function used in data analysis. The implementations run, respectively, on CPU and GPU, and both devices cooperatively (hybrid). CPU and GPU implementations are based on OpenMP and OpenCL, respectively. The hybrid implementation allows the application to run also on multi-GPU systems (not necessarily of the same type). The hybrid case uses a scheduler so that the workload needed for the evaluation of function is split and balanced in corresponding sub-workloads to be executed in parallel on each device, i. e. CPU-GPU or multi-CPUs. We present the results of the scalability when running on CPU. Then we show the comparison of the performance of the GPU implementation on different hardware systems from different vendors, and the performance when running in the hybrid case. The tests are based on likelihood functions from real data analysis carried out in the high energy physics community.
Journal of Physics: Conference Series | 2011
Sverre Jarp; A. Lazzaro; Julien Leduc; Andrzej Nowak
Leakage currents have put a stop to the semiconductor industrys ability to increase processor frequency in order to enhance the performance of new microprocessors. Instead, we observe a slew of changes inside the micro-architecture with an aim of enhancing the performance. Several of these changes, however, do not translate into automatic speed improvements for the software. This paper discusses the increased complexity of modern microprocessors by separating out into dimensions each feature that impacts performance and mentions briefly ways of improving software, in particular that of the High Energy Physics community, to take full advantage.
Journal of Physics: Conference Series | 2011
Sverre Jarp; A. Lazzaro; Julien Leduc; Andrzej Nowak
As researchers have reached the practical limits of processor performance improvements by frequency scaling, it is clear that the future of computing lies in the effective utilization of parallel and multi-core architectures. Since this significant change in computing is well underway, it is vital for HEP programmers to understand the scalability of their software on modern hardware and the opportunities for potential improvements. This work aims to quantify the benefit of new mainstream architectures to the HEP community through practical benchmarking on recent hardware solutions, including the usage of parallelized HEP applications.
Journal of Physics: Conference Series | 2012
Sverre Jarp; A. Lazzaro; Julien Leduc; Andrzej Nowak
The continued progression of Moores law has led to many-core platforms becoming easily accessible commodity equipment. New opportunities that arose from this change have also brought new challenges: harnessing the raw potential of computation of such a platform is not always a straightforward task. This paper describes practical experience coming out of the work with many-core systems at CERN openlab and the observed differences with respect to their predecessors. We provide the latest results for a set of parallelized HEP benchmarks running on several classes of many-core platforms.