Is this you? Create Your Porfile

W. Lavrijsen

Lawrence Berkeley National Laboratory

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where W. Lavrijsen is active.

Explore More

Publication

Featured researches published by W. Lavrijsen.

international conference on software engineering | 2016

Floating-point precision tuning using blame analysis

Cindy Rubio-González; Cuong Nguyen; Benjamin Mehne; Koushik Sen; James Demmel; William Kahan; Costin Iancu; W. Lavrijsen; David H. Bailey; David Hough

While tremendously useful, automated techniques for tuning the precision of floating-point programs face important scalability challenges. We present Blame Analysis, a novel dynamic approach that speeds up precision tuning. Blame Analysis performs floating-point instructions using different levels of accuracy for their operands. The analysis determines the precision of all operands such that a given precision is achieved in the final result of the program. Our evaluation on ten scientific programs shows that Blame Analysis is successful in lowering operand precision. As it executes the program only once, the analysis is particularly useful when targeting reductions in execution time. In such case, the analysis needs to be combined with search-based tools such as Precimonious. Our experiments show that combining Blame Analysis with Precimonious leads to obtaining better results with significant reduction in analysis time: the optimized programs execute faster (in three cases, we observe as high as 39.9% program speedup) and the combined analysis time is 9× faster on average, and up to 38× faster than Precimonious alone.

Journal of Physics: Conference Series | 2010

Recent developments in the LHCb software framework gaudi

M. Clemencic; Hubert Degaudenzi; P. Mato; Sebastien Binet; W. Lavrijsen; C. Leggett; I. Belyaev

After ten years from its first version, the Gaudi software framework underwent many changes and improvements with a subsequent increase of the code base. Those changes were almost always introduced preserving the backward compatibility and reducing as much as possible changes in the framework itself; obsolete code has been removed only rarely. After a release of Gaudi targeted to the data taking of 2008, it has been decided to have a review of the code of the framework with the aim of a general consolidation in view of the data taking of 2009. We also decided to take the occasion to introduce those improvements never implemented because of the big impact they have on the rest of the code, and those changes of the framework needed to solve some intrinsic problems of the implementation, but never made because they were considered too disruptive. With this contribution we want to describe which are the problems we addressed and the improvements we made to the framework during this review.

acm sigplan symposium on principles and practice of parallel programming | 2015

Barrier elision for production parallel programs

Milind Chabbi; W. Lavrijsen; Wibe A. de Jong; Koushik Sen; John M. Mellor-Crummey; Costin Iancu

Large scientific code bases are often composed of several layers of runtime libraries, implemented in multiple programming languages. In such situation, programmers often choose conservative synchronization patterns leading to suboptimal performance. In this paper, we present context-sensitive dynamic optimizations that elide barriers redundant during the program execution. In our technique, we perform data race detection alongside the program to identify redundant barriers in their calling contexts; after an initial learning, we start eliding all future instances of barriers occurring in the same calling context. We present an automatic on-the-fly optimization and a multi-pass guided optimization. We apply our techniques to NWChem--a 6 million line computational chemistry code written in C/C++/Fortran that uses several runtime libraries such as Global Arrays, ComEx, DMAPP, and MPI. Our technique elides a surprisingly high fraction of barriers (as many as 63%) in production runs. This redundancy elimination translates to application speedups as high as 14% on 2048 cores. Our techniques also provided valuable insight about the application behavior, later used by NWChem developers. Overall, we demonstrate the value of holistic context-sensitive analyses that consider the domain science in conjunction with the associated runtime software stack.

ieee international conference on high performance computing data and analytics | 2018

Maximizing Communication Overlap with Dynamic Program Analysis

Emmanuelle Saillard; Koushik Sen; W. Lavrijsen; Costin Iancu

We present a dynamic program analysis approach to optimize communication overlap in scientific applications. Our tool instruments the code to generate a trace of the applications memory and synchronization behavior. An offline analysis determines the program optimal points for maximal overlap when considering several programming constructs: nonblocking one-sided communication operations, non-blocking collectives and bespoke synchronization patterns and operations. Feedback about possible transformations is presented to the user and the tool can perform the directed transformations, which are supported by a lightweight runtime. The value of our approach comes from: 1) the ability to optimize across boundaries of software modules or libraries, while specializing for the intrinsics of the underlying communication runtime; and 2) providing upper bounds on the expected performance improvements after communication optimizations. We have reduced the time spent in communication by as much as 64% for several applications that were already aggressively optimized for overlap; this indicates that manual optimizations leave untapped performance. Although demonstrated mainly for the UPC programming language, the methodology can be easily adapted to any other communication and synchronization API.

international parallel and distributed processing symposium | 2017

Application Level Reordering of Remote Direct Memory Access Operations

W. Lavrijsen; Costin Iancu

We present methods for the effective application level reordering of non-blocking RDMA operations. We supplement out-of-order hardware delivery mechanisms with heuristics to account for the CPU side overhead of communication and for differences in network latency: a runtime scheduler takes into account message sizes, destination and concurrency and reorders operations to improve overall communication throughput. Results are validated on InfiniBand and Cray Aries networks, for SPMD and hybrid (SPMD+OpenMP) programming models. We show up to 5! potential speedup, with 30-50% more typical, for synthetic message patterns in microbenchmarks. We also obtain up to 33% improvement in the communication stages in application settings. While the design space is complex, the resulting scheduler is simple, both internally and at the application level interfaces. It also provides performance portability across networks and programming models. We believe these techniques can be easily retrofitted within any application or runtime framework that uses one-sided communication, e.g. using GASNet, MPI 3.0 RMA or low level APIs such as IBVerbs.

21st International Conference on Computing in High Energy and Nuclear Physics (CHEP2015) | 2015

Dual-use tools and systematics-aware analysis workflows in the ATLAS Run-2 analysis model

David Adams; P. Calafiura; Pierre-Antoine Delsart; M. Elsing; S. Farrell; Karsten Koeneke; A. Krasznahorkay; N. Krumnack; Eric Lancon; W. Lavrijsen; P. Laycock; Xiaowen Lei; S. Strandberg; Wouter Verkerke; I. Vivarelli; M. J. Woudstra

The ATLAS analysis model has been overhauled for the upcoming run of data collection in 2015 at 13 TeV. One key component of this upgrade was the Event Data Model (EDM), which now allows for greater ﬂexibility in the choice of analysis software framework and provides powerful new features that can be exploited by analysis software tools. A second key component of the upgrade is the introduction of a dual-use tool technology, which provides abstract interfaces for analysis software tools to run in either the Athena framework or a ROOT-based framework. The tool interfaces, including a new interface for handling systematic uncertainties, have been standardized for the development of improved analysis workﬂows and consolidation of high-level analysis tools. This paper will cover the details of the dual-use tool functionality, the systematics interface, and how these features ﬁt into a centrally supported analysis environment.

Lawrence Berkeley National Laboratory | 2004