Nick Barrow-Williams | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Nick Barrow-Williams is active.

Explore More

Publication

Featured researches published by Nick Barrow-Williams.

ieee international symposium on workload characterization | 2009

A communication characterisation of Splash-2 and Parsec

Nick Barrow-Williams; Christian Fensch; Simon W. Moore

Recent benchmark suite releases such as Parsec specifically utilise the tightly coupled cores available in chip-multiprocessors to allow the use of newer, high performance, models of parallelisation. However, these techniques introduce additional irregularity and complexity to data sharing and are entirely dependent on efficient communication performance between processors. This paper thoroughly examines the crucial communication and sharing behaviour of these future applications. The infrastructure used allows both accurate and comprehensive program analysis, employing a full Linux OS running on a simulated 32-core x86 machine. Experiments use full program runs, with communication classified at both core and thread granularities. Migratory, read-only and producer-consumer sharing patterns are observed and their behaviour characterised. The temporal and spatial characteristics of communication are presented for the full collection of Splash-2 and Parsec benchmarks. Our results aim to support the design of future communication systems for CMPs, encompassing coherence protocols, network-on-chip and thread mapping.

international conference on parallel architectures and compilation techniques | 2010

Proximity coherence for chip multiprocessors

Nick Barrow-Williams; Christian Fensch; Simon W. Moore

Many-core architectures provide an efficient way of harnessing the increasing numbers of transistors available in modern fabrication processes. While they are similar to multi-node systems, they exhibit different communication latency and storage characteristics, providing new design opportunities that were previously not feasible. Traditional cache coherence protocols, although often used in many-core designs, have been developed in the context of multi-node systems. As such, they seldom take advantage of the new possibilities that many-core architectures offer. We propose Proximity Coherence, a scheme in which L1 load misses are optimistically forwarded to nearby caches via new dedicated links rather than always being indirected via a directory structure. Such an optimization is made possible by the comparable cost of local cache accesses with the use of on-chip network resources. Coherency is maintained using lightweight graph structures embedded in the L1 caches. We compare our Proximity Coherence protocol to an existing directory-based MESI protocol using full-system simulations of a 32 core system. Our extension lowers the latency of L1 cache load misses by up to 32% while reducing the bytes transferred on the global on-chip interconnect by up to 19% for a range of parallel benchmarks. Employing Proximity Coherence provides execution time improvements of up to 13%, reduces cache hierarchy energy consumption by up to 30% and delivers a more efficient solution to the challenge of coherence in chip multiprocessors.

IEEE Transactions on Computers | 2013

Designing a Physical Locality Aware Coherence Protocol for Chip-Multiprocessors

Christian Fensch; Nick Barrow-Williams; Robert D. Mullins; Simon W. Moore

Many-core architectures provide an efficient way of harnessing the growing numbers of transistors available. However, energy and latency costs of communication increasingly limit the parallel programs running on these platforms. Existing designs provide a functional communication layer, but not necessarily the most efficient solution. Due to power limitations, efficiency is now a primary concern that motivates us to look again at cache coherence. First, we analyze the communication behavior of parallel applications. The observed sharing patterns reveal considerable locality of shared data accesses between threads with consecutive IDs. This pattern corresponds to strong physical locality between adjacent cores in a chip-multiprocessor (CMP). This paper explores the design of Proximity Coherence: a novel scheme in which L1 load misses are optimistically forwarded to nearby caches via new dedicated links. We exploit these patterns and improve the efficiency of communication. The results show that careful analysis leads to the design of a more efficient coherence protocol. The protocol reduces the latency of load misses by up to 33 percent (17 percent, on average), improving overall execution time by up to 13 percent. Furthermore, it also reduces network-on-chip traffic by 19 percent and energy consumption by up to 30 percent.

Archive | 2014