Adam Betts | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Adam Betts is active.

Explore More

Publication

Featured researches published by Adam Betts.

conference on object-oriented programming systems, languages, and applications | 2012

GPUVerify: a verifier for GPU kernels

Adam Betts; Nathan Chong; Alastair F. Donaldson; Shaz Qadeer; Paul Thomson

We present a technique for verifying race- and divergence-freedom of GPU kernels that are written in mainstream kernel programming languages such as OpenCL and CUDA. Our approach is founded on a novel formal operational semantics for GPU programming termed synchronous, delayed visibility (SDV) semantics. The SDV semantics provides a precise definition of barrier divergence in GPU kernels and allows kernel verification to be reduced to analysis of a sequential program, thereby completely avoiding the need to reason about thread interleavings, and allowing existing modular techniques for program verification to be leveraged. We describe an efficient encoding for data race detection and propose a method for automatically inferring loop invariants required for verification. We have implemented these techniques as a practical verification tool, GPUVerify, which can be applied directly to OpenCL and CUDA source code. We evaluate GPUVerify with respect to a set of 163 kernels drawn from public and commercial sources. Our evaluation demonstrates that GPUVerify is capable of efficient, automatic verification of a large number of real-world kernels.

acm sigplan symposium on principles and practice of parallel programming | 2014

Concurrency testing using schedule bounding: an empirical study

Paul Thomson; Alastair F. Donaldson; Adam Betts

We present the first independent empirical study on schedule bounding techniques for systematic concurrency testing (SCT). We have gathered 52 buggy concurrent software benchmarks, drawn from public code bases, which we call SCTBench. We applied a modified version of an existing concurrency testing tool to SCTBench to attempt to answer several research questions, including: How effective are the two main schedule bounding techniques, preemption bounding and delay bounding, at bug finding? What challenges are associated with applying SCT to existing code? How effective is schedule bounding compared to a naive random scheduler at finding bugs? Our findings confirm that delay bounding is superior to preemption bounding and that schedule bounding is more effective at finding bugs than unbounded depth-first search. The majority of bugs in SCTBench can be exposed using a small bound (1-3), supporting previous claims, but there is at least one benchmark that requires 5 preemptions. Surprisingly, we found that a naive random scheduler is at least as effective as schedule bounding for finding bugs. We have made SCTBench and our tools publicly available for reproducibility and use in future work.

ACM Transactions on Programming Languages and Systems | 2015

The Design and Implementation of a Verification Technique for GPU Kernels

Adam Betts; Nathan Chong; Alastair F. Donaldson; Jeroen Ketema; Shaz Qadeer; Paul Thomson; John Wickerson

We present a technique for the formal verification of GPU kernels, addressing two classes of correctness properties: data races and barrier divergence. Our approach is founded on a novel formal operational semantics for GPU kernels termed <i>synchronous, delayed visibility (SDV)</i> semantics, which captures the execution of a GPU kernel by multiple groups of threads. The SDV semantics provides operational definitions for barrier divergence and for both inter- and intra-group data races. We build on the semantics to develop a method for reducing the task of verifying a massively parallel GPU kernel to that of verifying a sequential program. This completely avoids the need to reason about thread interleavings, and allows existing techniques for sequential program verification to be leveraged. We describe an efficient encoding of data race detection and propose a method for automatically inferring the loop invariants that are required for verification. We have implemented these techniques as a practical verification tool, GPUVerify, that can be applied directly to OpenCL and CUDA source code. We evaluate GPUVerify with respect to a set of 162 kernels drawn from public and commercial sources. Our evaluation demonstrates that GPUVerify is capable of efficient, automatic verification of a large number of real-world kernels.

international conference on parallel architectures and compilation techniques | 2015

PENCIL: A Platform-Neutral Compute Intermediate Language for Accelerator Programming

Riyadh Baghdadi; Ulysse Beaugnon; Albert Cohen; Tobias Grosser; Michael Kruse; Chandan Reddy; Sven Verdoolaege; Adam Betts; Alastair F. Donaldson; Jeroen Ketema; Javed Absar; Sven Van Haastregt; Alexey Kravets; Anton Lokhmotov; Róbert Dávid; Elnar Hajiyev

Programming accelerators such as GPUs with low-level APIs and languages such as OpenCL and CUDA is difficult, error-prone, and not performance-portable. Automatic parallelization and domain specific languages (DSLs) have been proposed to hide complexity and regain performance portability. We present PENCIL, a rigorously-defined subset of GNU C99-enriched with additional language constructs-that enables compilers to exploit parallelism and produce highly optimized code when targeting accelerators. PENCIL aims to serve both as a portable implementation language for libraries, and as a target language for DSL compilers. We implemented a PENCIL-to-OpenCL backend using a state-of-the-art polyhedral compiler. The polyhedral compiler, extended to handle data-dependent control flow and non-affine array accesses, generates optimized OpenCL code. To demonstrate the potential and performance portability of PENCIL and the PENCIL-to-OpenCL compiler, we consider a number of image processing kernels, a set of benchmarks from the Rodinia and SHOC suites, and DSL embedding scenarios for linear algebra (BLAS) and signal processing radar applications (SpearDE), and present experimental results for four GPU platforms: AMD Radeon HD 5670 and R9 285, NVIDIA GTX 470, and ARM Mali-T604.

computer aided verification | 2014

Engineering a Static Verification Tool for GPU Kernels

Ethel Bardsley; Adam Betts; Nathan Chong; Peter Collingbourne; Pantazis Deligiannis; Alastair F. Donaldson; Jeroen Ketema; Daniel Liew; Shaz Qadeer

We report on practical experiences over the last 2.5 years related to the engineering of GPUVerify, a static verification tool for OpenCL and CUDA GPU kernels, plotting the progress of GPUVerify from a prototype to a fully functional and relatively efficient analysis tool. Our hope is that this experience report will serve the verification community by helping to inform future tooling efforts.

IEEE Transactions on Parallel and Distributed Systems | 2016

Acceleration of a Full-Scale Industrial CFD Application with OP2

I. Z. Reguly; Gihan R. Mudalige; Carlo Bertolli; Michael B. Giles; Adam Betts; Paul H. J. Kelly; David Radford

Hydra is a full-scale industrial CFD application used for the design of turbomachinery at Rolls Royce plc., capable of performing complex simulations over highly detailed unstructured mesh geometries. Hydra presents major challenges in data organization and movement that need to be overcome for continued high performance on emerging platforms. We present research in achieving this goal through the OP2 domain-specific high-level framework, demonstrating the viability of such a high-level programming approach. OP2 targets the domain of unstructured mesh problems and enables execution on a range of back-end hardware platforms. We chart the conversion of Hydra to OP2, and map out the key difficulties encountered in the process. Specifically we show how different parallel implementations can be achieved with an active library framework, even for a highly complicated industrial application and how different optimizations targeting contrasting parallel architectures can be applied to the whole application, seamlessly, reducing developer effort and increasing code longevity. Performance results demonstrate that not only the same runtime performance as that of the hand-tuned original code could be achieved, but it can be significantly improved on conventional processor systems, and many-core systems. Our results provide evidence of how high-level frameworks such as OP2 enable portability across a wide range of contrasting platforms and their significant utility in achieving high performance without the intervention of the application programmer.

euromicro conference on real-time systems | 2013

Estimating the WCET of GPU-Accelerated Applications Using Hybrid Analysis

Adam Betts; Alastair F. Donaldson

The massive parallelism offered by Graphics Processing Units (GPUs) is now routinely exploited to accelerate computationally intensive tasks in a wide variety of application domains. Efficient GPU programming in languages such as CUDA and OpenCL requires careful application of hand optimisations to exploit parallelism and locality while minimising synchronisation. The effectiveness of such optimisations can be highly dependent on workload and the structure of input data, making it difficult to assess performance in general by testing alone. To address this, we study the problem of estimating the Worst-Case Execution Time (WCET) of GPU-accelerated applications. We propose the use of hybrid WCET analysis whereby execution times of small program segments are deduced from traces of execution and a calculation backend derived from the Control Flow Graph (CFG) produces a WCET estimate. Standard techniques which construct a CFG from a binary cannot be applied directly to GPU code because they miss implicit execution paths that arise due the way branches are implemented in hardware - we present a solution using standard compiler analysis. We further describe how to extend the basic hybrid WCET analysis of sequential code so that concurrent timing effects in the GPU execution model are incorporated. We have implemented our analysis as a tool built on top of the GPGPU-sim open source simulator. We evaluate our tool using a set of benchmarks drawn from the CUDA SDK: results show that effective modelling of concurrency is key to reducing pessimism in the WCET calculation.

worst case execution time analysis | 2010

Hybrid measurement-based WCET analysis at the source level using object-level traces

Adam Betts; Nicholas Merriam; Guillem Bernat

Hybrid measurement-based approaches to worst-case execution time (WCET) analysis combine measured execution times of small program segments using static analysis of the larger software structure. In order to make the necessary measurements, instrumentation code is added to generate a timestamped trace from the running program. The intrusive presence of this instrumentation code incurs a timing penalty, widely referred to as the probe eect. However, recent years have seen the emergence of trace capability at the hardware level, eectively opening the door to probe-free analysis. Relying on hardware support forces the WCET analysis to the object-code level, since that is all that is known by the hardware. A major disadvantage of this is that it is expensive for a typical software engineer to interpret the results, since most engineers are familiar with the source code but not the object code. Meaningful WCET analysis involves not just running a tool to obtain an overall WCET value but also understanding which sections of code consume most of the WCET in order that corrective actions, such as optimisation, can be applied if the WCET value is too large. The main contribution of this paper is a mechanism by which hybrid WCET analysis can still be performed at the source level when the timestamped trace has been collected at the object level by state-of-the-art hardware. This allows existing, commercial tools, such as RapiTime, to operate without the need for intrusive instrumentation and thus without the probe eect.

languages and compilers for parallel computing | 2012

Compiler Optimizations for Industrial Unstructured Mesh CFD Applications on GPUs

Carlo Bertolli; Adam Betts; Nicolas Loriant; Gihan R. Mudalige; David Radford; David A. Ham; Michael B. Giles; Paul H. J. Kelly

Graphical Processing Units (GPUs) have shown acceleration factors over multicores for structured mesh-based Computational Fluid Dynamics (CFD). However, the value remains unclear for dynamic and irregular applications. Our motivating example is HYDRA, an unstructured mesh application used in production at Rolls-Royce for the simulation of turbomachinery components of jet engines. We describe three techniques for GPU optimization of unstructured mesh applications: a technique able to split a highly complex loop into simpler loops, a kernel specific alternative code synthesis, and configuration parameter tuning. Using these optimizations systematically on HYDRA improves the GPU performance relative to the multicore CPU. We show how these optimizations can be automated in a compiler, through user annotations. Performance analysis of a large number of complex loops enables us to study the relationship between optimizations and resource requirements of loops, in terms of registers and shared memory, which directly affect the loop performance.

international conference on engineering of complex computer systems | 2011

WCET Analysis of Component-Based Systems Using Timing Traces

Adam Betts; Amine Marref

Construction of a Real-Time System (RTS) out of a number of pre-fabricated pieces of software, otherwise known as components, is a pervasive area of interest. Typically, only relocatable object code of the component is shipped to the customer, so that it can later be linked into the overall application. Source code is therefore withheld, and disassembling of the object code is normally disallowed to protect intellectual property. Both of these restrictions complicate, or even prevent, state-of-the-art Worst-Case Execution Time (WCET) analysis of the RTS since most techniques are grounded on their availability in order to generate a complete program model. The alternative solution -- widespread in industrial circles -- is to record the largest end-to-end execution time of the RTS under functional testing, but this underestimates the actual WCET, in the general case. This paper shows how to obtain a safer WCET estimate of a RTS composed of components using time-stamped traces of program execution. In effect, the data needed in the WCET computation (program model, execution times, execution bounds) are derived exclusively from parsing of the traces. Experiments indicate that, once simple coverage metrics have been obtained, the calculated WCET estimate bounds the actual WCET. Moreover, where instrumentation (which produces the time-stamped traces) is placed with respect to program structure has a significant bearing on the accuracy of the computed WCET estimate.

Explore More