J. Mark Bull
University of Edinburgh
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by J. Mark Bull.
ACM Sigarch Computer Architecture News | 2001
J. Mark Bull; Darragh O'Neill
In this paper we present a set of extensions to an existing microbenchmark suite for OpenMP. The new benchmarks are targeted at directives introduced in the OpenMP 2.0 standard, as well as at the handling of thread-private data structures. Results are presented for a Sun HPC 6500 system, with an early access release of an OpenMP 2.0 compliant compiler, and for an SGI Origin 3000 system.
international workshop on openmp | 2012
J. Mark Bull; Fiona Reid; Nicola McDonnell
We present a set of extensions to an existing microbenchmark suite for OpenMP. The new benchmarks measure the overhead of the task construct introduced in the OpenMP 3.0 standard, and associated task synchronisation constructs. We present the results from a variety of compilers and hardware platforms, which demonstrate some significant differences in performance between different OpenMP implementations.
ieee international conference on high performance computing data and analytics | 2014
Iain Bethune; J. Mark Bull; Nicholas J. Dingle; Nicholas J. Higham
Ever-increasing core counts create the need to develop parallel algorithms that avoid closely coupled execution across all cores. We present performance analysis of several parallel asynchronous implementations of Jacobi’s method for solving systems of linear equations, using MPI, SHMEM and OpenMP. In particular we have solved systems of over 4 billion unknowns using up to 32,768 processes on a Cray XE6 supercomputer. We show that the precise implementation details of asynchronous algorithms can strongly affect the resulting performance and convergence behaviour of our solvers in unexpected ways, discuss how our specific implementations could be generalised to other classes of problem, and suggest how existing parallel programming models might be extended to allow asynchronous algorithms to be expressed more easily.
International Journal of Parallel Programming | 2010
J. Mark Bull; James P. Enright; Xu Guo; C.M. Maynard; Fiona Reid
With the current prevalence of multi-core processors in HPC architectures mixed-mode programming, using both MPI and OpenMP in the same application, is seen as an important technique for achieving high levels of scalability. As there are few standard benchmarks written in this paradigm, it is difficult to assess the likely performance of such programs. To help address this, we examine the performance of mixed-mode OpenMP/MPI on a number of popular HPC architectures, using a synthetic benchmark suite and two large-scale applications. We find performance characteristics which differ significantly between implementations, and which highlight possible areas for improvement, especially when multiple OpenMP threads communicate simultaneously via MPI.
international workshop on openmp | 2009
J. Mark Bull; James P. Enright; Nadia Ameer
With the current prevalence of multi-core processors in HPC architectures, mixed-mode programming, using both MPI and OpenMP in the same application, is becoming increasingly important. However, no low-level synthetic benchmarks exist to test the performance of this programming model. We have designed and implemented a set of microbenchmarks for mixed-mode programming, including both point-to-point and collective communication patterns. These microbenchmarks have been run on a number of current HPC architectures: the results show some interesting performance differences between the architectures and highlight some possible inefficiencies in the implementation of MPI on multi-core systems.
parallel computing | 2000
T. L. Freeman; David Hancock; J. Mark Bull; Rupert W. Ford
In earlier papers ([2], [3], [6]) feedback guided loop scheduling algorithms have been shown to be very effective for certain loop scheduling problems which involve a sequential outer loop and a parallel inner loop and for which the workload of the parallel loop changes only slowly from one execution to the next. In this paper the extension of these ideas the case of nested parallel loops is investigated. We describe four feedback guided algorithms for scheduling nested loops and evaluate the performances of the algorithms on a set of synthetic benchmarks.
ieee international conference on high performance computing data and analytics | 2012
Mark Tucker; J. Mark Bull
Archive | 2011
Iain Bethune; J. Mark Bull; Nicholas J. Dingle; Nicholas J. Higham
high performance computational finance | 2015
Mark Tucker; J. Mark Bull
arXiv: Computational Finance | 2014
Mark Tucker; J. Mark Bull