John Goodacre | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where John Goodacre is active.

Explore More

Publication

Featured researches published by John Goodacre.

design, automation, and test in europe | 2016

ECOSCALE: Reconfigurable computing and runtime system for future exascale systems

Iakovos Mavroidis; Ioannis Papaefstathiou; Luciano Lavagno; Dimitrios S. Nikolopoulos; Dirk Koch; John Goodacre; Ioannis Sourdis; Vasileios Papaefstathiou; Marcello Coppola; Manuel Palomino

In order to reach exascale performance, current HPC systems need to be improved. Simple hardware scaling is not a feasible solution due to the increasing utility costs and power consumption limitations. Apart from improvements in implementation technology, what is needed is to refine the HPC application development flow as well as the system architecture of future HPC systems. ECOSCALE tackles these challenges by proposing a scalable programming environment and architecture, aiming to substantially reduce energy consumption as well as data traffic and latency. ECOSCALE introduces a novel heterogeneous energy-efficient hierarchical architecture, as well as a hybrid many-core+OpenCL programming environment and runtime system. The ECOSCALE approach is hierarchical and is expected to scale well by partitioning the physical system into multiple independent Workers (i.e. compute nodes). Workers are interconnected in a tree-like fashion and define a contiguous global address space that can be viewed either as a set of partitions in a Partitioned Global Address Space (PGAS), or as a set of nodes hierarchically interconnected via an MPI protocol. To further increase energy efficiency, as well as to provide resilience, the Workers employ reconfigurable accelerators mapped into the virtual address space utilizing a dual stage System Memory Management Unit with coherent memory access. The architecture supports shared partitioned reconfigurable resources accessed by any Worker in a PGAS partition, as well as automated hardware synthesis of these resources from an OpenCL-based programming model.

design, automation, and test in europe | 2013

From embedded multi-core SoCs to scale-out processors

Marcello Coppola; Babak Falsafi; John Goodacre; George Kornaros

Information technology is now an indispensable pillar of a modern day society. CMOS technologies, which lay the foundation for all digital platforms, however, are experiencing a major inflection point due to a slowdown in voltage scaling. The net result is that power is emerging as the key design constraint for all platforms from embedded systems to datacenters. This tutorial presents emerging design paradigms from embedded multicore SoCs to server processors for scale-out datacenters based on mobile cores.

design, automation, and test in europe | 2016

EUROSERVER: Share-anything scale-out micro-server design

Manolis Marazakis; John Goodacre; Didier Fuin; Paul M. Carpenter; John Thomson; Emil Matus; Antimo Bruno; Per Stenström; Jérôme Martin; Yves Durand; Isabelle Dor

This paper provides a snapshot summary of the trends in the area of micro-server development and their application in the broader enterprise and cloud markets. Focusing on the technology aspects, we provide an understanding of these trends and specifically the differentiation and uniqueness of the approach being adopted by the EUROSERVER FP7 project. The unique technical contributions of EUROSERVER range from the fundamental system compute unit design architecture, through to the implementation approach both at the chiplet nanotechnological integration, and the everything-close physical form factor. Furthermore, we offer optimizations at the virtualisation layer to exploit the unique hardware features, and other framework optimizations, including exploiting the hardware capabilities at the run-time system and application layers.

automation, robotics and control systems | 2018

A CAM-Free Exascalable HPC Router for Low-Energy Communications

Caroline Concatto; Jose Antonio Pascual; Javier Navaridas; Joshua Lant; Andrew Attwood; Mikel Luján; John Goodacre

Power consumption is the main hurdle in the race for designing Exascale-capable computing systems which would require deploying millions of computing elements. While this problem is being addressed by designing increasingly more power-efficient processing subsystems, little effort has been put on reducing the power consumption of the interconnection network. This is precisely the objective of this work, in which we study the benefits, in terms of both area and power, of avoiding costly and power-hungry CAM-based routing tables deep-rooted in all current networking technologies. We present our custom-made, FPGA-based router based on a simple, arithmetic routing engine which is shown to be much more power- and area-efficient than even a relatively small 2K-entry routing table which requires as much area and one order of magnitude more power than our router.

virtual execution environments | 2017

HyperMAMBO-X64: Using Virtualization to Support High-Performance Transparent Binary Translation

Amanieu d'Antras; Cosmin Gorgovan; Jim D. Garside; John Goodacre; Mikel Luján

Current computer architectures --- ARM, MIPS, PowerPC, SPARC, x86 --- have evolved from a 32-bit architecture to a 64-bit one. Computer architects often consider whether it could be possible to eliminate hardware support for a subset of the instruction set as to reduce hardware complexity, which could improve performance, reduce power usage and accelerate processor development. This paper considers the scenario where we want to eliminate 32-bit hardware support from the ARMv8 architecture. Dynamic binary translation can be used for this purpose and generally comes in one of two forms: application-level translators that translate a single user mode process on top of a native operating system, and system-level translators that translate an entire operating system and all its processes. Application-level translators can have good performance but is not totally transparent; system-level translators may be 100% compatible but performance suffers. HyperMAMBO-X64 uses a new approach that gets the best of both worlds, being able to run the translator as an application under the hypervisor but still react to the behavior of guest operating systems. It works with complete transparency with regards to the virtualized system whilst delivering performance close to that provided by hardware execution. A key factor in the low overhead of HyperMAMBO-X64 is its deep integration with the virtualization and memory management features of ARMv8. These are exploited to support caching of translations across multiple address spaces while ensuring that translated code remains consistent with the source instructions it is based on. We show how these attributes are achieved without sacrificing either performance or accuracy.

congress on evolutionary computation | 2017

Designing an exascale interconnect using multi-objective optimization

Jose Antonio Pascual; Joshua Lant; Andrew Attwood; Caroline Concatto; Javier Navaridas; Mikel Luján; John Goodacre

Exascale performance will be delivered by systems composed of millions of interconnected computing cores. The way these computing elements are connected with each other (network topology) has a strong impact on many performance characteristics. In this work we propose a multi-objective optimization-based framework to explore possible network topologies to be implemented in the EU-funded ExaNeSt project. The modular design of this systems interconnect provides great flexibility to design topologies optimized for specific performance targets such as communications locality, fault tolerance or energy-consumption. The generation procedure of the topologies is formulated as a three-objective optimization problem (minimizing some topological characteristics) where solutions are searched using evolutionary techniques. The analysis of the results, carried out using simulation, shows that the topologies meet the required performance objectives. In addition, a comparison with a well-known topology reveals that the generated solutions can provide better topological characteristics and also higher performance for parallel applications.

digital systems design | 2017

Paving the Way Towards a Highly Energy-Efficient and Highly Integrated Compute Node for the Exascale Revolution: The ExaNoDe Approach

Alvise Rigo; Christian Pinto; Kevin Pouget; Daniel Raho; Denis Dutoit; Pierre-Yves Martinez; Chris Doran; Luca Benini; Iakovos Mavroidis; Manolis Marazakis; Valeria Bartsch; Guy Lonsdale; Antoniu Pop; John Goodacre; Annaik Colliot; Paul M. Carpenter; Petar Radojković; Dirk Pleiter; Dominique Drouin; Benoît Dupont de Dinechin

Power consumption and high compute density are the key factors to be considered when building a compute node for the upcoming Exascale revolution. Current architectural design and manufacturing technologies are not able to provide the requested level of density and power efficiency to realise an operational Exascale machine. A disruptive change in the hardware design and integration process is needed in order to cope with the requirements of this forthcoming computing target. This paper presents the ExaNoDe H2020 research project aiming to design a highly energy efficient and highly integrated heterogeneous compute node targeting Exascale level computing, mixing low-power processors, heterogeneous co-processors and using advanced hardware integration technologies with the novel UNIMEM Global Address Space memory system.

Concurrency and Computation: Practice and Experience | 2018

On the effects of allocation strategies for exascale computing systems with distributed storage and unified interconnects: Effects of allocation strategies for exascale computing systems

Jose Antonio Pascual; Joshua Lant; Caroline Concatto; Andrew Attwood; Javier Navaridas; Mikel Luján; John Goodacre

The convergence between computing‐ and data‐centric workloads and platforms is imposing new challenges on how to best use the resources of modern computing systems. In this paper, we investigate alternatives for the storage subsystem of a novel exascale‐capable system with special emphasis on how allocation strategies would affect the overall performance. We consider several aspects of data‐aware allocation such as the effect of spatial and temporal locality, the affinity of data to storage sources, and the network‐level traffic prioritization for different types of flows. In our experimental set‐up, temporal locality can have a substantial effect on application runtime (up to a 10% reduction), whereas spatial locality can be even more significant (up to one order of magnitude faster with perfect locality). The use of structured access patterns to the data and the allocation of bandwidth at the network level can also have a significant impact (up to 20% and 17% reduction of runtime, respectively). These results suggest that scheduling policies exposing data‐locality information can be essential for the appropriate utilization of future large‐scale systems. Finally, we found that the distributed storage system we are implementing can outperform traditional SAN architectures, even with a much smaller (in terms of I/O servers) back‐end.

Concurrency and Computation: Practice and Experience | 2018

Enabling shared memory communication in networks of MPSoCs: Enabling shared memory communication in networks of MPSoCs

Joshua Lant; Caroline Concatto; Andrew Attwood; Jose Antonio Pascual; Mike Ashworth; Javier Navaridas; Mikel Luján; John Goodacre

Ongoing transistor scaling and the growing complexity of embedded system designs has led to the rise of MPSoCs (Multi‐Processor System‐on‐Chip), combining multiple hard‐core CPUs and accelerators (FPGA, GPU) on the same physical die. These devices are of great interest to the supercomputing community, who are increasingly reliant on heterogeneity to achieve power and performance goals in these closing stages of the race to exascale. In this paper, we present a network interface architecture and networking infrastructure, designed to sit inside the FPGA fabric of a cutting‐edge MPSoC device, enabling networks of these devices to communicate within both a distributed and shared memory context, with reduced need for costly software networking system calls. We will present our implementation and prototype system and discuss the main design decisions relevant to the use of the Xilinx Zynq Ultrascale+, a state‐of‐the‐art MPSoC, and the challenges to be overcome given the devices limitations and constraints. We demonstrate the working prototype system connecting two MPSoCs, with communication between processor and remote memory region and accelerator. We then discuss the limitations of the current implementation and highlight areas of improvement to make this solution production‐ready.

Computing in Science and Engineering | 2017

Innovating the Delivery of Server Technology with Kaleao KMAX

John Goodacre

This article introduces a new computer system architecture and how its benefits are tailored toward the cloud, fog, and ICT computing infrastructure. This new technology delivers a compute performance density 10 times higher than existing solutions while consuming only a quarter of the power and costing a third of the initial capital outlay for that performance. The capabilities of KMAX offers an attractive alternative to the PC platform, while opening new compute, storage, and networking capabilities.

Explore More