Rupert W. Ford
University of Manchester
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Rupert W. Ford.
european conference on parallel processing | 1996
Michael F. P. O'Boyle; Rupert W. Ford; Andy Nisbet
This paper presents new compiler analysis for the elimination of invalidation traffic in virtual shared memory, using a hybrid distributed invalidation coherence scheme. The invalidation and acknowledgement messages are removed; this reduces both network invalidation traffic and the latency of a write fault. It aggressively exploits the SPMD execution model and uses array section analysis to accurately determine only those instances when invalidation is necessary, thus avoiding the additional read misses of previous schemes. Equations determining precisely what data should be invalidated are presented and translated into a form amenable to compiler analysis. Preliminary experimental results on a 30 node prototype architecture demonstrate the performance attainable using this scheme.
Philosophical Transactions of the Royal Society A | 2005
Rafael Delgado-Buscalioni; Peter V. Coveney; Graham D. Riley; Rupert W. Ford
Over the past three years we have been developing a new approach for the modelling and simulation of complex fluids. This approach is based on a multiscale hybrid scheme, in which two or more contiguous subdomains are dynamically coupled together. One subdomain is described by molecular dynamics while the other is described by continuum fluid dynamics; such coupled models are of considerable importance for the study of fluid dynamics problems in which only a restricted aspect requires a fully molecular representation. Our model is representative of the generic set of coupled models whose algorithmic structure presents interesting opportunities for deployment on a range of architectures including computational grids. Here we describe the implementation of our HybridMD code within a coupling framework that facilitates flexible deployment on such architectures.
grid computing | 2005
Christopher W. Armstrong; Rupert W. Ford; John R. Gurd; Mikel Luján; Kenneth R. Mayes; Graham D. Riley
In recent years, there has been increasing interest in the development of computer simulations of complex biological systems, and of multi‐physics and multi‐scale physical phenomena. Applications have been developed that involve the coupling together of separate executable models of individual systems, where these models may have been developed in isolation. A lightweight yet general solution is required to problems of linking coupled models, and of handling the incompatibilities between interacting models that arise from their diverse origins and natures. Many such models require high‐performance computers to provide acceptable execution times, and there is increasing interest in utilizing Grid technologies. However, Grid applications need the ability to cope with heterogeneous and dynamically changing execution environments, particularly where run‐time changes can affect application performance. A general coupling framework (GCF) is described that allows the construction of flexible coupled models. This approach results in a component‐based implementation of a coupled model application. A semi‐formal presentation of GCF is given. Components under GCF are separately deployable and coupled by simple data flows, making them appropriate structures for dynamic execution platforms such as the Grid. The design and initial implementation of a performance control system (PERCO) is reviewed. PERCO acts by redeploying components, and is thus appropriate for controlling GCF coupled model applications. Redeployment decisions in PERCO require performance prediction capabilities. A proof‐of‐concept performance prediction algorithm is presented, based on the descriptions of GCF and PERCO. Copyright
grid computing | 2004
Ken Mayes; Graham D. Riley; Rupert W. Ford; Mikel Luján; Len Freeman; Cliff Addison
A major method of constructing applications to run on a computational Grid is to assemble them from components - separately deployable units of computation of well-defined functionality. Performance steering is an adaptive process involving run-time adjustment of factors affecting the performance of an application. This paper presents a design for a system capable of steering, towards a minimum run-time, the performance of a component-based application executing in a distributed fashion on a computational Grid. The proposed performance steering system controls the performance of single applications, and the basic design seeks to separate application-level and component-level concerns. The existence of a middleware resource scheduler external to the performance steering system is assumed, and potential problems are discussed. A possible model of operation is given in terms of application and component execution phases. The need for performance prediction capability, and for repositories of application-specific and component-specific performance information, is discussed. An initial implementation is briefly described.
international conference on parallel architectures and compilation techniques | 1996
Michael F. P. O'Boyle; Andy Nisbet; Rupert W. Ford
This paper presents a new compiler algorithm to eliminate invalidation traffic in virtual shared memory using a hybrid distributed invalidation scheme. It aggressively exploits static scheduling and data layout to accurately determine only those instances when invalidation is necessary, thus avoiding the additional read misses of previous schemes. Equations determining precisely what data should be invalidated are presented and followed by the derivation of approximations amenable to compiler manipulation. Compiler-directed invalidation in the presence of arbitrary control-flow is described and the definition of a compiler algorithm is presented. Preliminary experimental results on three programs show that this analysis can drastically reduce the amount of invalidation traffic and write misses.
parallel computing | 2000
T. L. Freeman; David Hancock; J. Mark Bull; Rupert W. Ford
In earlier papers ([2], [3], [6]) feedback guided loop scheduling algorithms have been shown to be very effective for certain loop scheduling problems which involve a sequential outer loop and a parallel inner loop and for which the workload of the parallel loop changes only slowly from one execution to the next. In this paper the extension of these ideas the case of nested parallel loops is investigated. We describe four feedback guided algorithms for scheduling nested loops and evaluate the performances of the algorithms on a set of synthetic benchmarks.
Journal of Parallel and Distributed Computing | 2003
Michael F. P. O'Boyle; Rupert W. Ford; Elena Stöhr
This paper develops and proves an exact distributed invalidation algorithm for programs with general array accesses, arbitrary parallelisation and migratory writes. We present an efficient constructive algorithm that globally combines locally gathered information to insert coherence calls in such a manner to eliminate invalidation traffic without loss of locality and places the minimal number of coherence calls. Experimental results across a range of benchmarks show that it outperforms hardware based sequential and release consistency approaches and decreases application execution time by up to 12%. This is due to eliminating over 99% of the invalidation traffic in all benchmarks. This dramatic reduction in invalidation traffic reduces the total amount of network traffic by up to 28% and the number of network words transmitted by up to 19%.
ieee international conference on high performance computing data and analytics | 1996
Andy Nisbet; Rupert W. Ford
This paper introduces spinning-on-coherency (SOC) a technique for virtual shared memory (VSM) which enables latency-hiding of remote reads and the removal of related synchronisation points. Coherence-bits are hardware-tags associated with addresses which record local access permissions (such as read, write, invalid). In SOC a user-thread spins on the particular coherence-bits associated with an address until the new data value is asynchronously propagated and the address becomes valid. Data-propagation occurs when another node issues an update after having written the new value. Performance improvements are demonstrated for two codes, representing the core communication found in Shallow (a well known numerical weather prediction benchmark), and CG (from the NAS Parallel Benchmarks). These are run on a 30 node prototype distributed memory architecture (EDS), with invalidation based sequentially consistent VSM. SOC is also applicable to other consistency models and directory schemes, whether in hardware or software and complements other VSM optimisations. Currently such optimisation is performed by the programmer, but there is much scope for automating this process within a compiler.
1 ed. Springer Verlag; 2011. | 2012
Rupert W. Ford; Graham D. Riley; Reinhard Budich; Ren Redler
Collected articles in this series are dedicated to the development and use of software for earth system modelling and aims at bridging the gap between IT solutions and climate science. The particular topic covered in this volume addresses the process of configuring, building, and running earth system models. Earth system models are typically a collection of interacting computer codes (often called components) which together simulate the earth system. Each code component is written to model some physical process which forms part of the earth system (such as the Ocean). This book is concerned with the source code version control of these code components, the configuration of these components into earth system models, the creation of executable(s) from the component source code and related libraries and the running and monitoring of the resultant executables on the available hardware.
european conference on parallel processing | 2000
Rupert W. Ford; Michael F. P. O'Boyle; Elena Stöhr
This paper develops and proves an exact distributed invalidation algorithm for programs with compile time decidable control-flow. We present an efficient constructive algorithm that globally combines locally gathered information to insert coherence calls in such a manner that eliminates all invalidation traffic without loss of locality and places the minimal number of coherence calls. Experimental results show that it outperforms existing compiler directed coherence techniques and hardware basedme mory consistency.