David B. Skillicorn
Queen's University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by David B. Skillicorn.
ACM Computing Surveys | 1998
David B. Skillicorn; Domenico Talia
We survey parallel programming models and languages using six criteria to assess their suitability for realistic portable parallel programming. We argue that an ideal model should by easy to program, should have a software development methodology, should be architecture-independent, should be easy to understand, should guarantee performance, and should provide accurate information about the cost of programs. These criteria reflect our belief that developments in parallelism must be driven by a parallel software industry based on portability and efficiency. We consider programming models in six categories, depending on the level of abstraction they provide. Those that are very abstract conceal even the presence of parallelism at the software level. Such models make software easy to build and port, but efficient and predictable performance is usually hard to achieve. At the other end of the spectrum, low-level models make all of the messy issues of parallel programming explicit (how many threads, how to place them, how to express communication, and how to schedule communication), so that software is hard to build and not very portable, but is usually efficient. Most recent models are near the center of this spectrum, exploring the best tradeoffs between expressiveness and performance. A few models have achieved both abstractness and efficiency. Both kinds of models raise the possibility of parallelism as part of the mainstream of computing.
Scientific Programming | 1997
David B. Skillicorn; Jonathan M. D. Hill; William F. McColl
Bulk Synchronous Parallelism (BSP) is a parallel programming model that abstracts from low-level program structures in favour of supersteps. A superstep consists of a set of independent local computations, followed by a global communication phase and a barrier synchronisation. Structuring programs in this way enables their costs to be accurately determined from a few simple architectural parameters, namely the permeability of the communication network to uniformly-random traffic and the time to synchronise. Although permutation routing and barrier synch ronisations are widely regarded as inherently expensive, this is not the case. As a result, the structure imposed by BSP does not reduce performance, while bringing considerable benefits for application building. This paper answers the most common questions we are asked about BSP and justifies its claim to be a major step forward in parallel programming.
Future Generation Computer Systems | 1998
Jonathan M. D. Hill; David B. Skillicorn
Abstract We focus on two criticisms of bulk synchronous parallelism (BSP): that delaying communication until specific points in a program causes poor performance, and that frequent barrier synchronisations are too expensive for high-performance parallel computing. We show that these criticisms are misguided, not just about BSP but about parallel programming in general, because they are based on misconceptions about the origins of poor performance. The main implication for parallel programming is that higher levels of abstraction do not only make software construction easier — they also make high-performance implementation easier.
euromicro workshop on parallel and distributed processing | 1998
Jonathan M. D. Hill; David B. Skillicorn
We investigate the performance of barrier synchronisation on both shared memory and distributed memory architectures, using a wide range of techniques. The performance results obtained show that distributed memory architectures behave predictably, although their performance for barrier synchronisation is relatively poor. For shared memory architectures, a much larger range of implementation techniques are available. We show that asymptotic analysis is useless, and a detailed understanding of the underlying hardware is required to design an effective barrier implementation. We show that a technique using cache coherence is more effective than semaphore or lock based techniques, and is competitive with specialised barrier synchronisation hardware.
Journal of Parallel and Distributed Computing | 1996
David B. Skillicorn
Trees are a useful data type, but they are not routinely included in parallel programming systems, in part because their irregular structure makes partitioning and scheduling difficult. We present a method for algebraically constructing implementations of tree skeletons, high-level homomorphic operations that execute in parallel. Many computations on binary trees can be performed inO(logn) parallel time usingnprocessors, even taking account of communication costs. We extend these results to trees with arbitrary and variable degree. Then we show that it is possible to implement a distributed version of homomorphisms on binary trees, takingO(n/p+ log2p) parallel time onp < nprocessors, for trees of any skew and taking full account of communication costs. Under slightly stronger restrictions on the underlying functions, this can be improved toO(n/p+ logp). Furthermore, the technique for deriving distributed versions is algebraic, allowing the automatic generation of code for SPMD and data-parallel architectures.
cluster computing and the grid | 2002
David B. Skillicorn
We examine plausible motivations for both using and building computational grids. We find two reasons to use such grids: the existence of a workload in which tasks have deadlines, but the load varies over time; and the existence of an upper limit on cost-effective parallel systems, forcing replication when greater degrees of parallelism are required. We speculate that there may be scope for public grids, in which protecting the integrity of information is not guaranteed, but that there is much larger potential for virtual private grids within organizations. In both cases, the form of markets, execution planning, and pricing is likely to be different from the frictionless markets predicted in the literature.
parallel computing | 2000
Stephen R. Donaldson; Jonathan M. D. Hill; David B. Skillicorn
Abstract We describe a transport protocol suitable for BSPlib programs running on a cluster of PCs connected by a 100 Mbps Ethernet switch. The protocol provides a reliable packet-delivery mechanism that uses global knowledge of a programs communication pattern to maximise switch performance. The performance is comparable to previous low-latency protocols on similar hardware, but the addition of reliability means that this protocol can be directly used by application software. For a modest budget of
The Computer Journal | 1996
David B. Skillicorn
US 20 000 it is possible to build a machine that outperforms an IBM SP2 on all the NAS benchmarks (BT +80%, SP +70%, MG +9%, and LU +65% improvement), and an SGI Origin 2000 on half (BT +10, SP −24%, MG +10%, and LU −28%). The protocol has a CPU overhead of 1.5 μ s for packet download and 3.6 μ s for upload. Small packets can be communicated through the switch in a pipelined fashion every 21 μ s. Application-to-application one-way latency is 29 μ s plus the latency of the switch. A raw link bandwidth of 93 Mbps is achieved for 1400-byte packets, and 50 Mbps for 128-byte packets. This scales to eight processors communicating at 91 Mbps per link, to give a sustained global bandwidth of 728 Mbps.
european conference on parallel processing | 1998
Stephen R. Donaldson; Jonathan M. D. Hill; David B. Skillicorn
Hypermedia technology provides both an opportunity for universities to provide a better learning experience for their students, and a way to cope with funding reductions. Second-generation hypermedia systems makes it cost-effective to develop and deliver multimedia courseware, while permitting learning to occur within a community. We illustrate by describing our experiences developing and offering a hypermedia course in computer architecture, in which lectures were replaced by on-line courseware, using the Hyper-G system.
Concurrency and Computation: Practice and Experience | 1999
Stephen R. Donaldson; Jonathan M. D. Hill; David B. Skillicorn
The BSP cost model measures the cost of communication using a single architectural parameter, g, which measures permeability of the network to continuous traffic. Architectures, typically networks of workstations, pose particular problems for high-performance communication because it is hard to achieve high throughput, and even harder to do so predictably. Yet both of these are required for BSP to be effective. We present a technique for controlling applied communication load that achieves both. Traffic is presented to the communication network at a rate chosen to maximise throughput and minimise its variance. Performance improvements as large as a factor of two over MPI can be achieved.