Jonathan M. D. Hill | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jonathan M. D. Hill is active.

Explore More

Publication

Featured researches published by Jonathan M. D. Hill.

Scientific Programming | 1997

Questions and Answers about BSP

David B. Skillicorn; Jonathan M. D. Hill; William F. McColl

Bulk Synchronous Parallelism (BSP) is a parallel programming model that abstracts from low-level program structures in favour of supersteps. A superstep consists of a set of independent local computations, followed by a global communication phase and a barrier synchronisation. Structuring programs in this way enables their costs to be accurately determined from a few simple architectural parameters, namely the permeability of the communication network to uniformly-random traffic and the time to synchronise. Although permutation routing and barrier synch ronisations are widely regarded as inherently expensive, this is not the case. As a result, the structure imposed by BSP does not reduce performance, while bringing considerable benefits for application building. This paper answers the most common questions we are asked about BSP and justifies its claim to be a major step forward in parallel programming.

parallel computing | 1998

BSPlib: The BSP programming library

Jonathan M. D. Hill; Bill McColl; Dan C. Stefanescu; Mark W. Goudreau; Kevin J. Lang; Satish Rao; Torsten Suel; Thanasis Tsantilas; Rob H. Bisseling

BSPlib is a small communications library for bulk synchronous parallel (BSP) programming which consists of only 20 basic operations. This paper presents the full definition of BSPlib in C, motivates the design of its basic operations, and gives examples of their use. The library enables programming in two distinct styles: direct remote memory access (DRMA) using put or get operations, and bulk synchronous message passing (BSMP). Currently, implementations of BSPlib exist for a variety of modern architectures, including massively parallel computers with distributed memory, shared memory multiprocessors, and networks of workstations. BSPlib has been used in several scientific and industrial applications; this paper briefly describes applications in benchmarking, Fast Fourier Transforms (FFTs), sorting, and molecular dynamics.

Future Generation Computer Systems | 1998

Lessons learned from implementing BSP

Jonathan M. D. Hill; David B. Skillicorn

Abstract We focus on two criticisms of bulk synchronous parallelism (BSP): that delaying communication until specific points in a program causes poor performance, and that frequent barrier synchronisations are too expensive for high-performance parallel computing. We show that these criticisms are misguided, not just about BSP but about parallel programming in general, because they are based on misconceptions about the origins of poor performance. The main implication for parallel programming is that higher levels of abstraction do not only make software construction easier — they also make high-performance implementation easier.

euromicro workshop on parallel and distributed processing | 1998

Practical barrier synchronisation

Jonathan M. D. Hill; David B. Skillicorn

We investigate the performance of barrier synchronisation on both shared memory and distributed memory architectures, using a wide range of techniques. The performance results obtained show that distributed memory architectures behave predictably, although their performance for barrier synchronisation is relatively poor. For shared memory architectures, a much larger range of implementation techniques are available. We show that asymptotic analysis is useless, and a detailed understanding of the underlying hardware is required to design an effective barrier implementation. We show that a technique using cache coherence is more effective than semaphore or lock based techniques, and is competitive with specialised barrier synchronisation hardware.

european conference on parallel processing | 1996

Theory, Practice, and a Tool for BSP Performance Prediction

Jonathan M. D. Hill; Paul I. Crumpton; David A. Burgess

The Bulk Synchronous Parallel (BSP) model provides a theoretical framework to accurately predict the execution time of parallel programs. In this paper we describe a BSP programming library that has been developed and contrast two approaches to analysing performance: (1) a pencil and paper method; (2) a profiling tool that analyses trace information generated during program execution. These approaches are evaluated on an industrial application code that solves fluid dynamics equations around a complex aircraft geometry on IBM SP2 and SGI Power Challenge machines. We show how the profiling tool can be used to explore the communication patterns of the CFD code and accurately predict the performance of the application on any parallel machine.

parallel computing | 2000

BSP clusters: high performance, reliable and very low cost

Stephen R. Donaldson; Jonathan M. D. Hill; David B. Skillicorn

Abstract We describe a transport protocol suitable for BSPlib programs running on a cluster of PCs connected by a 100 Mbps Ethernet switch. The protocol provides a reliable packet-delivery mechanism that uses global knowledge of a programs communication pattern to maximise switch performance. The performance is comparable to previous low-latency protocols on similar hardware, but the addition of reliability means that this protocol can be directly used by application software. For a modest budget of

european conference on parallel processing | 1998

Predictable Communication on Unpredictable Networks: Implementing BSP over TCP/IP

Stephen R. Donaldson; Jonathan M. D. Hill; David B. Skillicorn

US 20 000 it is possible to build a machine that outperforms an IBM SP2 on all the NAS benchmarks (BT +80%, SP +70%, MG +9%, and LU +65% improvement), and an SGI Origin 2000 on half (BT +10, SP −24%, MG +10%, and LU −28%). The protocol has a CPU overhead of 1.5 μ s for packet download and 3.6 μ s for upload. Small packets can be communicated through the switch in a pipelined fashion every 21 μ s. Application-to-application one-way latency is 29 μ s plus the latency of the switch. A raw link bandwidth of 93 Mbps is achieved for 1400-byte packets, and 50 Mbps for 128-byte packets. This scales to eight processors communicating at 91 Mbps per link, to give a sustained global bandwidth of 728 Mbps.

european conference on parallel processing | 1998

Process Migration and Fault Tolerance of BSPlib Programs Running on Networks of Workstations

Jonathan M. D. Hill; Stephen R. Donaldson; Tim Lanfear

The BSP cost model measures the cost of communication using a single architectural parameter, g, which measures permeability of the network to continuous traffic. Architectures, typically networks of workstations, pose particular problems for high-performance communication because it is hard to achieve high throughput, and even harder to do so predictably. Yet both of these are required for BSP to be effective. We present a technique for controlling applied communication load that achieves both. Traffic is presented to the communication network at a rate chosen to maximise throughput and minimise its variance. Performance improvements as large as a factor of two over MPI can be achieved.

parallel computing | 2002

Portable and architecture independent parallel performance tuning using BSP

Stephen A. Jarvis; Jonathan M. D. Hill; Constantinos J. Siniolakis; Vasil P. Vasilev

This paper describes a system that enables parallel programs written using the BSPlib communications library to migrate processes among a network of workstations. Not only does the system provide fault tolerance of BSPlib jobs, but by utilising a load manager that maintains an approximation of the global load of the system, it is possible to continually schedule the migration of BSP processes onto the least loaded machines in a network. Results are provided for an industrial electromagnetics application that show that we can achieve similar throughput on a publically available collection of workstations as a dedicated NOW.

Concurrency and Computation: Practice and Experience | 1999

Predictable communication on unpredictable networks: implementing BSP over TCP/IP and UDP/IP

Stephen R. Donaldson; Jonathan M. D. Hill; David B. Skillicorn

A call-graph profiling tool has been designed and implemented to analyse the efficiency of programs written in BSPlib, This tool highlights computation and communication imbalance in parallel programs, exposing portions of program code which are amenable to improvement.A unique feature of this profiler is that it uses the bulk synchronous parallel cost model, thus providing a mechanism for portable and architecture-independent parallel performance tuning. In order to test the capabilities of the model on a real-world example, the performance characteristics of an SQL query processing application are investigated on a number of different parallel architectures.

Explore More