David H. D. Warren
University of Bristol
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by David H. D. Warren.
New Generation Computing | 1990
Ewing L. Lusk; Ralph Butler; Terrence Disz; Robert Olson; Ross Overbeek; Rick Stevens; David H. D. Warren; Alan Calderwood; Péter Szeredi; Seif Haridi; Per Brand; Mats Carlsson; Andrzej Ciepielewski; Bogumil Hausman
Aurora is a prototype or-parallel implementation of the full Prolog language for shared-memory multiprocessors, developed as part of an informal research collaboration known as the “Gigalips Project”. It currently runs on Sequent and Encore machines. It has been constructed by adapting Sicstus Prolog, a fast, portable, sequential Prolog system. The techniques for constructing a portable multiprocessor version follow those pioneered in a predecessor system, ANL-WAM. The SRI model was adopted as the means to extend the Sicstus Prolog engine for or-parallel operation. We describe the design and main implementation features of the current Aurora system, and present some experimental results. For a range of benchmarks, Aurora on a 20-processor Sequent Symmetry is 4 to 7 times faster than Quintus Prolog on a Sun 3/75. Good performance is also reported on some large-scale Prolog applications.
acm sigplan symposium on principles and practice of parallel programming | 1991
Vítor Santos Costa; David H. D. Warren; Rong Yang
Andorra-I is an experimental parallel Prolog system that transparently exploits both dependent and-parallelism and or-parallelism. It constitutes the first implementation of the Basic Andorra model, a parallel execution model for logic programs in which determinate goals are executed before other goals. This model, besides combining two of the most important forms of implicit parallelism in logic programs, also provides a form of implicit coroutining. This means that Andorra-I not only supports standard Prolog but also provides the capabilities of flat committed-choice languages such as Parlog and GHC. We give an overview of the main problems in the implementation of Andorra-I: the design of its engine and of the preprocessor that generates code to recognise determinate goals. We then present performance data for our implementation. This data shows that Andorra-I, an interpreter, has a single-processor performance similar to the comparable sequential system, C-Prolog, while on multiple processors Andorra-I is able to obtain good *This work was supported by ESPRtTproject 2471 (“PEPMA”). Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the ACM copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Association for Computing Machinery. To copy otherwise, or to republish, requires a fee and/or specific permission. 01991 ACM 0-89791-390-6191 /000410083 . ..
international conference on parallel architectures and languages europe | 1991
Anthony Joseph Beaumont; S. Muthu Raman; Péter Szeredi; David H. D. Warren
1 .50 speedups fmm both and-parallelism and or-parallelism. In suitable cases, the speedup obtained from exploiting both forms of parallelism combined is better than that obtainable from exploiting either kind alone.
international parallel processing symposium | 1994
Henk L. Muller; Paul W. A. Stallard; David H. D. Warren; Sanjay Raina
Aurora is a prototype or-parallel implementation of the full Prolog language for shared memory multiprocessors, based on the SRI model of execution. It consists of a Prolog engine based on SICStus Prolog and several alternative schedulers. The task of the schedulers is to share the work available in the Prolog search tree
hawaii international conference on system sciences | 1992
Sanjay Raina; David H. D. Warren
A parallel transputer-based emulator has been developed to evaluate the Data Diffusion Machine (DDM)/spl minus/a highly parallel virtual shared memory architecture. The emulator provides performance results of a hardware implementation of the DDM using a calibrated virtual clock. Unlike the virtual clock of a simulator, the emulator clock is bound to a fixed fraction of real time, so individual processors may time actions independently without the need for a global clock value. Each component of the emulator is artificially slowed down, so that the balance of the speeds of all components reflects the balance of the expected hardware implementation. The calibrated emulator runs an order of magnitude faster than a simulator (the application program is executed directly and there is no overhead for the maintenance of event lists) and, more importantly, the emulator is inherently parallel. This results in a peak emulation speed of 27 MIPS when simulating a machine with 81 leaf nodes on a 121-node transputer system.<<ETX>>
high-performance computer architecture | 1996
Henk L. Muller; Paul W. A. Stallard; David H. D. Warren
The authors present a multiprocessor emulator designed to evaluate a scalable shared virtual memory architecture called the Data Diffusion Machine (DDM). The DDM is characterised by the lack of any fixed home location for data, with the virtual address being completely decoupled from the physical location of a datum. The authors describe the design of the emulator for the DDM and its transputer-based implementation. The emulator provides a flexible platform for evaluating the architecture and enables one to study the overall behaviour of the machine while running real, lace shared-memory applications. They present a profile of traffic observed at the controllers in the DDM hierarchy while running a variety of real shared-memory applications.<<ETX>>
New Generation Computing | 1995
Vítor Santos Costa; David H. D. Warren; Rong Yang
In this paper we investigate the combination of multitasking and multithreading in a (virtual) shared memory parallel machine running a number of parallel applications. In particular, we investigate whether it is better to run related threads, or unrelated threads on each node to achieve the best system throughput and to complete a mix of applications as quickly as possible. The experiments provide results for a range of mixes of applications. One of our benchmarks has a clear preference to place its threads across the whole machine, while the others have a slight preference to run their threads on smaller partitions of the machine. The differences are mostly slight, suggesting that the system scheduler has considerable flexibility in thread placement without jeopardising performance.
international conference on parallel processing | 1996
Henk L. Muller; Paul W. A. Stallard; David H. D. Warren
Andorra-I is an experimental parallel Prolog system which transparently exploits both dependent and-parallelism and or-parallelism. One of the main components of Andorra-I is its preprocessor. In order to obtain efficient execution of programs in Andorra-I, the preprocessor includes a compiler for Andorra-I. The compiler includes a determinacy analyser and a clause compiler, and generates code for a specialised abstract machine. In this paper we discuss the main issues in the Andorra-I compiler, presenting its abstract instruction set and describing the algorithms used in its implementation.
parallel computing | 2003
Jorge Buenabad-Chávez; Henk L. Muller; Paul W. A. Stallard; David H. D. Warren
The Data Diffusion Machine is a scalable virtual shared memory architecture. A hierarchical network is used to ensure that all data can be located in a time bounded by O(log p), where p is the number of processors. The DDM hierarchy requires a high degree of connectivity between clusters of nodes, which can be provided with point-to-point links. For large machines the wiring will be complex. We discuss the implementation of such networks, and develop three alternative implementations. The base level performance of each alternative has been measured on an emulator of the DDM. The final solution collapses the physical hierarchy, and we show that this does not affect the performance, while clearly simplifying the design. It demonstrates that with the use of crossbar routers we can make a cheap, scalable and high performance implementation of the DDM.
euromicro workshop on parallel and distributed processing | 1996
Henk L. Muller; Paul W. A. Stallard; David H. D. Warren
Data diffusion architectures (also known as cache only memory architectures) provide, a shared address space on top of distributed memory. Their distinctive feature is that data diffuses, or migrates and replicates, in main memory according to whichever processors are using the data. This requires an associative organisation of main memory, which decouples each address and its data item from any physical location. A data item can thus be placed and replicated where it is needed. Also, the physical address space does not have to be fixed and contiguous. It can be any set of addresses within the address range of the processors, possibly varying over time, provided it is smaller than the size of main memory. This flexibility is similar to that of a virtual address space, and offers new possibilities to organise a virtual memory system.We present an analysis of possible organisations of virtual memory on such architectures, and propose two main alternatives: traditional virtual memory (TVM) is organised around a fixed and contiguous physical address space using a traditional mapping; associative memory virtual memory (AMVM) is organised around a variable and non-contiguous physical address space using a simpler mapping.To evaluate TVM and AMVM, we extended a multiprocessor emulation of a data diffusion architecture to include part of the Mach operating system virtual memory. This extension implements TVM; a slightly modified version implements AMVM. On applications tested, AMVM shows a marginal performance gain over TVM. We argue that AMVM will offer greater advantages with higher degrees of parallelism or larger data sets.