Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Stephen R. Donaldson is active.

Publication


Featured researches published by Stephen R. Donaldson.


parallel computing | 2000

BSP clusters: high performance, reliable and very low cost

Stephen R. Donaldson; Jonathan M. D. Hill; David B. Skillicorn

Abstract We describe a transport protocol suitable for BSPlib programs running on a cluster of PCs connected by a 100 Mbps Ethernet switch. The protocol provides a reliable packet-delivery mechanism that uses global knowledge of a programs communication pattern to maximise switch performance. The performance is comparable to previous low-latency protocols on similar hardware, but the addition of reliability means that this protocol can be directly used by application software. For a modest budget of


european conference on parallel processing | 1998

Predictable Communication on Unpredictable Networks: Implementing BSP over TCP/IP

Stephen R. Donaldson; Jonathan M. D. Hill; David B. Skillicorn

US 20 000 it is possible to build a machine that outperforms an IBM SP2 on all the NAS benchmarks (BT +80%, SP +70%, MG +9%, and LU +65% improvement), and an SGI Origin 2000 on half (BT +10, SP −24%, MG +10%, and LU −28%). The protocol has a CPU overhead of 1.5 μ s for packet download and 3.6 μ s for upload. Small packets can be communicated through the switch in a pipelined fashion every 21 μ s. Application-to-application one-way latency is 29 μ s plus the latency of the switch. A raw link bandwidth of 93 Mbps is achieved for 1400-byte packets, and 50 Mbps for 128-byte packets. This scales to eight processors communicating at 91 Mbps per link, to give a sustained global bandwidth of 728 Mbps.


european conference on parallel processing | 1998

Process Migration and Fault Tolerance of BSPlib Programs Running on Networks of Workstations

Jonathan M. D. Hill; Stephen R. Donaldson; Tim Lanfear

The BSP cost model measures the cost of communication using a single architectural parameter, g, which measures permeability of the network to continuous traffic. Architectures, typically networks of workstations, pose particular problems for high-performance communication because it is hard to achieve high throughput, and even harder to do so predictably. Yet both of these are required for BSP to be effective. We present a technique for controlling applied communication load that achieves both. Traffic is presented to the communication network at a rate chosen to maximise throughput and minimise its variance. Performance improvements as large as a factor of two over MPI can be achieved.


Concurrency and Computation: Practice and Experience | 1999

Predictable communication on unpredictable networks: implementing BSP over TCP/IP and UDP/IP

Stephen R. Donaldson; Jonathan M. D. Hill; David B. Skillicorn

This paper describes a system that enables parallel programs written using the BSPlib communications library to migrate processes among a network of workstations. Not only does the system provide fault tolerance of BSPlib jobs, but by utilising a load manager that maintains an approximation of the global load of the system, it is possible to continually schedule the migration of BSP processes onto the least loaded machines in a network. Results are provided for an industrial electromagnetics application that show that we can achieve similar throughput on a publically available collection of workstations as a dedicated NOW.


Proceedings. Third Working Conference on Massively Parallel Programming Models (Cat. No.97TB100228) | 1997

Portability of performance with the BSPLib communications library

Jonathan M. D. Hill; Stephen R. Donaldson; David B. Skillicorn

The BSP cost model measures the cost of communication using a single architectural parameter, g, which measures permeability of the network to continuous traffic. Architectures such as networks of workstations pose particular problems for high-performance communication because it is hard to achieve high communication throughput, and even harder to do so predictably. Yet both of these are required for BSP to be effective. We present a technique for controlling applied communication load that achieves both. Traffic is presented to the communication network at a rate chosen to maximise throughput and minimise its variance. Significant performance improvements can be achieved compared to unstructured communication over the same transport protocols as in the case of, for example, MPI. Copyright


international parallel processing symposium | 1999

Performance Results for a Reliable Low-Latency Cluster Communication Protocol

Stephen R. Donaldson; Jonathan M. D. Hill; David B. Skillicorn

The BSP cost model makes a new level of power available for designing parallel algorithms. First, it models the actual behaviour of todays parallel computers, and so can be used to choose appropriate algorithms without completely implementing them. Second, it becomes possible to characterise the range of architecture performance over which a particular algorithm is the best choice. This provides the foundations for developing software that is both portable at the source code level, and in its expectation of performance. We illustrate this by comparing three possible implementations of broadcast, and show that a two-phase broadcast algorithm outperforms other techniques whenever the size of the data is large relative to the cost of synchronisation, and that broadcasting using trees is never a good technique (despite its continued popularity). We carry out a similar analysis for samplesort, and show that samplesort cannot perform well on networks of workstations unless the network bandwidth exceeds a certain threshold.


Future Generation Computer Systems | 1999

Communication performance optimisation requires minimising variance

Stephen R. Donaldson; Jonathan M. D. Hill; David B. Skillicorn

Existing low-latency protocols make unrealistically strong assumptions about reliability. This allows them to achieve impressive performance, but also prevents this performance being exploited by applications, which must then deal with reliability issues in the application code. We present results from a new protocol that provides error recovery, and whose performance is close to that of existing low-latency protocols. We achieve a CPU overhead of 1.5 μs for packet download and 3.6 μs for upload. Our results show that (a) executing a protocol in the kernel is not incompatible with high performance, and (b) complete control over the protocol stack enables (1) simple forms of flow control to be adopted, (2) proper bracketing of the unreliable portions of the interconnect thus minimising buffers held up for possible recovery, and (3) the sharing of buffer pools. The result is a protocol which performs well in the context of parallel computation and the loose coupling of processes in the workstations of a cluster.


international parallel processing symposium | 1999

Exploiting global structure for performance on clusters

Stephen R. Donaldson; J.M.D. Hill; D.B. Skillicom

Abstract The cost of communication in message-passing systems can only be computed based on a large number of low-level details. Consequently, the only architectural measure they naturally suggest is a first-order one, latency. We show that a second-order property, the standard deviation of the delivery times is also of interest. Most importantly, the average performance of a large communication system depends not only on the average performance of its components, but also on the standard deviation of these performances. In other words, building a high-performance system requires components that are themselves performing high-performance, but their performance must also have small variance. We illustrate this effect using distributions of the BSP g parameter. Lower bounds in the time per unit transfer of communication in large systems can be derived from data measured over single links.


international parallel processing symposium | 1999

BSP in CSP: Easy as ABC

Andrew Simpson; Jonathan M. D. Hill; Stephen R. Donaldson

Most parallel programming models for distributed-memory architectures are based on individual threads interacting via send and receive operations. We show that a more structured model, BSP, gains substantial performance improvements by exploiting the extra information implicit in its structure. In particular each thread learns something about global state whenever it receives a message. This information can be used to modify its own behavior to improve collective use of the communication system. The programming models semantics also provides implicit knowledge that can be exploited to increase performance. We show that these effects are useful at the application level by comparing the performance of BSP and MPI implementations of the NAS parallel benchmarks.


ieee international conference on high performance computing data and analytics | 1998

Communication Performance Optimisation Requires Minimising Variance

Stephen R. Donaldson; Jonathan M. D. Hill; David B. Skillicorn

In this paper we describe how the language of Communicating Sequential Processes (CSP) has been applied to the analysis of a transport layer protocol used in the implementation of the Bulk Synchronous Parallel model (BSP). The protocol is suited to the bulk transfer of data between a group of processes that communicate over an unreliable medium with fixed buffer capacities on both sender and receiver. This protocol is modelled using CSP, and verified using the refinement checker FDR2. This verification has been used to establish that the protocol is free from the potential for both deadlock and livelock, and also that it is fault-tolerant.

Collaboration


Dive into the Stephen R. Donaldson's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge