Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Murray Cole is active.

Publication


Featured researches published by Murray Cole.


parallel computing | 2004

Bringing skeletons out of the closet: a pragmatic manifesto for skeletal parallel programming

Murray Cole

Skeleton and pattern based parallel programming promise significant benefits but remain absent from mainstream practice. We consider why this situation has arisen and propose a number of design principles which may help to redress it. We sketch the eSkel library, which represents a concrete attempt to apply these principles, eSkel is based on C and MPI, thereby embedding its skeletons in a conceptually familiar framework. We present an application of eSkel and analyse it as a response to our manifesto.


international conference on computational science | 2006

Combining measurement and stochastic modelling to enhance scheduling decisions for a parallel mean value analysis algorithm

Gagarine Yaikhom; Murray Cole; Stephen Gilmore

In this paper we apply the high-level modelling language PEPA to the performance analysis of a parallel program with a pipeline skeleton which computes the Mean Value Analysis (MVA) algorithm for queueing networks.


european conference on parallel processing | 2005

Flexible skeletal programming with eskel

Anne Benoit; Murray Cole; Stephen Gilmore; Jane Hillston

We present an overview of eSkel, a library for skeletal parallel programming. eSkel aims to maximise the conceptual flexibility afforded by its component skeletons and to facilitate dynamic selection of skeleton compositions. We present simple examples which illustrate these properties, and discuss the implementation challenges which the model poses.


high performance embedded architectures and compilers | 2013

PARTANS: An autotuning framework for stencil computation on multi-GPU systems

Thibaut Lutz; Christian Fensch; Murray Cole

GPGPUs are a powerful and energy-efficient solution for many problems. For higher performance or larger problems, it is necessary to distribute the problem across multiple GPUs, increasing the already high programming complexity. In this article, we focus on abstracting the complexity of multi-GPU programming for stencil computation. We show that the best strategy depends not only on the stencil operator, problem size, and GPU, but also on the PCI express layout. This adds nonuniform characteristics to a seemingly homogeneous setup, causing up to 23% performance loss. We address this issue with an autotuner that optimizes the distribution across multiple GPUs.


Parallel Processing Letters | 2002

The Integration of Task and Data Parallel Skeletons

Herbert Kuchen; Murray Cole

We describe a skeletal parallel programming library which integrates task and data parallel constructs within an API for C++. Traditional skeletal requirements for higher orderness and polymorphism are achieved through exploitation of operator overloading and templates, while the underlying parallelism is provided by MPI. We present a case study describing two algorithms for the travelling salesman problem.


european conference on parallel processing | 1997

A Monadic Calculus for Parallel Costing of a Functional Language of Arrays

C. Barry Jay; Murray Cole; M. Sekanina; Paul Steckler

VEC is a higher-order functional language of nested arrays, which includes a general folding operation. Static computation of the shape of its programs is used to support a compositional cost calculus basedson a cost monad. This, in turn, is based on a cost algebra, whose operations may be customized to handle different cost regimes, especially for parallel programming. We present examples based on sequential costing and on the PRAM model of parallel computation. The latter has been implemented in Haskell, and applied to some linear algebra examples.


ieee international conference on high performance computing, data, and analytics | 2011

A machine learning-based approach for thread mapping on transactional memory applications

Márcio Castro; Luís Fabrício Wanderley Góes; Christiane Pousa Ribeiro; Murray Cole; Marcelo Cintra; Jean-François Méhaut

Thread mapping has been extensively used as a technique to efficiently exploit memory hierarchy on modern chip-multiprocessors. It places threads on cores in order to amortize memory latency and/or to reduce memory contention. However, efficient thread mapping relies upon matching application behavior with system characteristics. Particularly, Software Transactional Memory (STM) applications introduce another dimension due to its runtime system support. Existing STM systems implement several conflict detection and resolution mechanisms, which leads STM applications to behave differently for each combination of these mechanisms. In this paper we propose a machine learning-based approach to automatically infer a suitable thread mapping strategy for transactional memory applications. First, we profile several STM applications from the STAMP benchmark suite considering application, STM system and platform features to build a set of input instances. Then, such data feeds a machine learning algorithm, which produces a decision tree able to predict the most suitable thread mapping strategy for new unobserved instances. Results show that our approach improves performance up to 18.46% compared to the worst case and up to 6.37% over the Linux default thread mapping strategy.


international conference on computational science | 2005

Two fundamental concepts in skeletal parallel programming

Anne Benoit; Murray Cole

We define the concepts of nesting mode and interaction mode as they arise in the description of skeletal parallel programming systems. We suggest that these new concepts encapsulate fundamental design issues and may play a useful role in defining and distinguishing between the capabilities of competing systems. We present the decisions taken in our own Edinburgh Skeleton Library eSkel, and review the approaches chosen by a selection of other skeleton libraries.


international parallel and distributed processing symposium | 1995

Implementing the hierarchical PRAM on the 2D mesh: analyses and experiments

George Chochia; Murray Cole; Todd Heywood

We investigate aspects of the performance of the EREW instance of the Hierarchical PRAM (H-PRAM) model, a recursively partitionable PRAM, on the 2D mesh architecture via analysis and simulation experiments. Since one of the ideas behind the H-PRAM is to systematically exploit locality in order to negate the need for expensive communication hardware and thus promote cost-effective scalability, our design decisions are based on minimizing implementation costs. The Peano indexing scheme is used as a simple and natural means of allowing the dynamic, recursive partitioning of the mesh into arbitrarily-sized sub-meshes, as required by the H-PRAM. We show that for any sub-mesh the ratio of the largest manhattan distance between two nodes of the sub-mesh to that of the square mesh with an identical number of processors is at most 3/2 demonstrating the locality preserving properties of the Peano scheme for arbitrary partitions of the mesh. We provide matching analytical and experimental evidence that the routing required to efficiently implement the H-PRAM with this scheme can be implemented cheaply and effectively.


cluster computing and the grid | 2005

Enhancing the effective utilisation of grid clusters by exploiting on-line performability analysis

Anne Benoit; Murray Cole; Stephen Gilmore; Jane Hillston

In grid applications the heterogeneity and potential failures of the computing infrastructure poses significant challenges to efficient scheduling. Performance models have been shown to be useful in providing predictions on which schedules can be based (N. Furmento et al., 2002) and most such techniques can also take account of failures and degraded service. However, when several alternative schedules are to be compared it is vital that the analysis of the models does not become so costly as to outweigh the potential gain of choosing the best schedule. Moreover, it is vital that the modelling approach can scale to match the size and complexity of realistic applications. In this paper, we present a novel method of modelling job execution on grid compute clusters. As previously we use performance evaluation process algebra (PEPA) (J. Hillston, 1996) as the system description formalism, capturing both workload and computing fabric. The novel feature is that we make a continuous approximation of the state space underlying the PEPA model and represent it as a set of ordinary differential equations (ODEs) for solution, rather than a continuous time, but discrete state space, Markov chain.

Collaboration


Dive into the Murray Cole's collaboration.

Top Co-Authors

Avatar

Anne Benoit

University of Edinburgh

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge