Paul W. A. Stallard
University of Bristol
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Paul W. A. Stallard.
international parallel processing symposium | 1994
Henk L. Muller; Paul W. A. Stallard; David H. D. Warren; Sanjay Raina
A parallel transputer-based emulator has been developed to evaluate the Data Diffusion Machine (DDM)/spl minus/a highly parallel virtual shared memory architecture. The emulator provides performance results of a hardware implementation of the DDM using a calibrated virtual clock. Unlike the virtual clock of a simulator, the emulator clock is bound to a fixed fraction of real time, so individual processors may time actions independently without the need for a global clock value. Each component of the emulator is artificially slowed down, so that the balance of the speeds of all components reflects the balance of the expected hardware implementation. The calibrated emulator runs an order of magnitude faster than a simulator (the application program is executed directly and there is no overhead for the maintenance of event lists) and, more importantly, the emulator is inherently parallel. This results in a peak emulation speed of 27 MIPS when simulating a machine with 81 leaf nodes on a 121-node transputer system.<<ETX>>
high-performance computer architecture | 1996
Henk L. Muller; Paul W. A. Stallard; David H. D. Warren
In this paper we investigate the combination of multitasking and multithreading in a (virtual) shared memory parallel machine running a number of parallel applications. In particular, we investigate whether it is better to run related threads, or unrelated threads on each node to achieve the best system throughput and to complete a mix of applications as quickly as possible. The experiments provide results for a range of mixes of applications. One of our benchmarks has a clear preference to place its threads across the whole machine, while the others have a slight preference to run their threads on smaller partitions of the machine. The differences are mostly slight, suggesting that the system scheduler has considerable flexibility in thread placement without jeopardising performance.
international conference on parallel processing | 1996
Henk L. Muller; Paul W. A. Stallard; David H. D. Warren
The Data Diffusion Machine is a scalable virtual shared memory architecture. A hierarchical network is used to ensure that all data can be located in a time bounded by O(log p), where p is the number of processors. The DDM hierarchy requires a high degree of connectivity between clusters of nodes, which can be provided with point-to-point links. For large machines the wiring will be complex. We discuss the implementation of such networks, and develop three alternative implementations. The base level performance of each alternative has been measured on an emulator of the DDM. The final solution collapses the physical hierarchy, and we show that this does not affect the performance, while clearly simplifying the design. It demonstrates that with the use of crossbar routers we can make a cheap, scalable and high performance implementation of the DDM.
parallel computing | 2003
Jorge Buenabad-Chávez; Henk L. Muller; Paul W. A. Stallard; David H. D. Warren
Data diffusion architectures (also known as cache only memory architectures) provide, a shared address space on top of distributed memory. Their distinctive feature is that data diffuses, or migrates and replicates, in main memory according to whichever processors are using the data. This requires an associative organisation of main memory, which decouples each address and its data item from any physical location. A data item can thus be placed and replicated where it is needed. Also, the physical address space does not have to be fixed and contiguous. It can be any set of addresses within the address range of the processors, possibly varying over time, provided it is smaller than the size of main memory. This flexibility is similar to that of a virtual address space, and offers new possibilities to organise a virtual memory system.We present an analysis of possible organisations of virtual memory on such architectures, and propose two main alternatives: traditional virtual memory (TVM) is organised around a fixed and contiguous physical address space using a traditional mapping; associative memory virtual memory (AMVM) is organised around a variable and non-contiguous physical address space using a simpler mapping.To evaluate TVM and AMVM, we extended a multiprocessor emulation of a data diffusion architecture to include part of the Mach operating system virtual memory. This extension implements TVM; a slightly modified version implements AMVM. On applications tested, AMVM shows a marginal performance gain over TVM. We argue that AMVM will offer greater advantages with higher degrees of parallelism or larger data sets.
euromicro workshop on parallel and distributed processing | 1996
Henk L. Muller; Paul W. A. Stallard; David H. D. Warren
In designing a virtual shared memory architecture, an important consideration is whether the main memory should be conventional or (set-) associative. This is the main distinction between so-called CC-NUMA and COMA architectures. We investigate the consequences on price and performance of different choices of component in the memory hierarchy, assuming a main memory which is either conventional or set-associative. We use analytic models driven by accurate miss ratios determined from actual parallel executions of a range of realistic benchmarks. We make cost assumptions based on published figures for different types of storage components. Our results show that, for many programs, CC-NUMA machines need a large coherent cache, often equalling the size of the main memory, in order to achieve good price-performance. As a consequence, optimal CC-NUMA and COMA configurations tend to need set-associative memories of similar size, and show rather little difference in price-performance. Optimal COMA configurations tend to be more general purpose, as one can bind a configuration that is nearly optimal in price-performance for all applications that we used, while optimal CC-NUMA configurations tend to be more application specific.
parallel computing | 2004
Jorge Buenabad-Chávez; Henk L. Muller; Paul W. A. Stallard; David H. D. Warren
Data diffusion architectures (also known as cache only memory architectures) provide a shared address space using physically distributed main memory that is associative. The associative nature of main memory decouples each address and its data item from any physical location, allowing data items to diffuse, or migrate and replicate, in any node of main memory according to use. Hence remote accesses tend to become local accesses, making the distributed organisation of main memory transparent to software.However, for data to diffuse effectively with reasonable performance, a fraction of main memory must be reserved as diffusion space, to allow for data replication and freedom of data migration. At any moment the amount of distinct data resident in main memory must be less than the capacity of main memory. Otherwise data will keep moving around the interconnect medium and memory nodes, possibly continually displacing data in frequent use by the processors, resulting in poor performance.We present an analysis of the issues in the provision of diffusion space using empirical data from a realistic environment. Our experimental platform is a multiprocessor emulation of a data diffusion architecture that includes the virtual memory component of the Mach operating system. For flexibility in the provision of diffusion space in the context of set-associative memory, our results suggest the need for a simple interaction between virtual memory software and the data diffusing hardware.
Archive | 1993
Henk L. Muller; Paul W. A. Stallard; David H. D. Warren
Archive | 1994
Henk L. Muller; Paul W. A. Stallard; David H. D. Warren
international conference on parallel processing | 1995
Henk L. Muller; Paul W. A. Stallard; David H. D. Warren
international conference on parallel processing | 1995
Henk L. Muller; Paul W. A. Stallard; David H. D. Warren