Jim D. Garside | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jim D. Garside is active.

Explore More

Publication

Featured researches published by Jim D. Garside.

virtual execution environments | 2017

HyperMAMBO-X64: Using Virtualization to Support High-Performance Transparent Binary Translation

Amanieu d'Antras; Cosmin Gorgovan; Jim D. Garside; John Goodacre; Mikel Luján

Current computer architectures --- ARM, MIPS, PowerPC, SPARC, x86 --- have evolved from a 32-bit architecture to a 64-bit one. Computer architects often consider whether it could be possible to eliminate hardware support for a subset of the instruction set as to reduce hardware complexity, which could improve performance, reduce power usage and accelerate processor development. This paper considers the scenario where we want to eliminate 32-bit hardware support from the ARMv8 architecture. Dynamic binary translation can be used for this purpose and generally comes in one of two forms: application-level translators that translate a single user mode process on top of a native operating system, and system-level translators that translate an entire operating system and all its processes. Application-level translators can have good performance but is not totally transparent; system-level translators may be 100% compatible but performance suffers. HyperMAMBO-X64 uses a new approach that gets the best of both worlds, being able to run the translator as an application under the hypervisor but still react to the behavior of guest operating systems. It works with complete transparency with regards to the virtualized system whilst delivering performance close to that provided by hardware execution. A key factor in the low overhead of HyperMAMBO-X64 is its deep integration with the virtualization and memory management features of ARMv8. These are exploited to support caching of translations across multiple address spaces while ensuring that translated code remains consistent with the source instructions it is based on. We show how these attributes are achieved without sacrificing either performance or accuracy.

programming language design and implementation | 2017

Low overhead dynamic binary translation on ARM

Amanieu d'Antras; Cosmin Gorgovan; Jim D. Garside; Mikel Luján

The ARMv8 architecture introduced AArch64, a 64-bit execution mode with a new instruction set, while retaining binary compatibility with previous versions of the ARM architecture through AArch32, a 32-bit execution mode. Most hardware implementations of ARMv8 processors support both AArch32 and AArch64, which comes at a cost in hardware complexity. We present MAMBO-X64, a dynamic binary translator for Linux which executes 32-bit ARM binaries using only the AArch64 instruction set. We have evaluated the performance of MAMBO-X64 on three existing ARMv8 processors which support both AArch32 and AArch64 instruction sets. The performance was measured by comparing the running time of 32-bit benchmarks running under MAMBO-X64 with the same benchmark running natively. On SPEC CPU2006, we achieve a geometric mean overhead of less than 7.5% on in-order Cortex-A53 processors and a performance improvement of 1% on out-of-order X-Gene 1 processors. MAMBO-X64 achieves such low overhead by novel optimizations to map AArch32 floating-point registers to AArch64 registers dynamically, handle overflowing address calculations efficiently, generate traces that harness hardware return address prediction, and handle operating system signals accurately.

international symposium on circuits and systems | 2017

Dynamic voltage and frequency scaling for neuromorphic many-core systems

Sebastian Höppner; Yexin Yan; Bernhard Vogginger; Andreas Dixius; Johannes Partzsch; Felix Neumarker; Stephan Hartmann; Stefan Schiefer; Stefan Scholze; Georg Ellguth; Love Cederstroem; Matthias Eberlein; Christian Mayr; Steve Temple; Luis A. Plana; Jim D. Garside; Simon Davison; David R. Lester; Steve B. Furber

We present a dynamic voltage and frequency scaling technique within SoCs for per-core power management: the architecture allows for individual, self triggered performance-level scaling of the processing elements (PEs) within less than 100ns. This technique enables each core to adjust its local supply voltage and frequency depending on its current computational load. A test chip has been implemented in 28nm CMOS technology, as prototype of the SpiNNaker2 neuromorphic many core system, containing 4 PEs which are operational within the range of 1.1V down to 0.7V at frequencies from 666MHz down to 100MHz; the effectiveness of the power management technique is demonstrated using a standard benchmark from the application domain. The particular domain area of this application specific processor is real-time neuromorphics. Using a standard benchmark — the synfire chain — we show that the total power consumption can be reduced by 45%, with 85% baseline power reduction and a 30% reduction of energy per neuron and synapse computation, all while maintaining biological real-time operation.

field programmable logic and applications | 2017

Asynchronous interface FIFO design on FPGA for high-throughput NRZ synchronisation

Gengting Liu; Jim D. Garside; Stephen B. Furber; Luis A. Plana; Dirk Koch

Networks-on-chip (NoCs) have become a new chip design paradigm as the size of transistors continues to shrink. Globally-asynchronous locally-synchronous (GALS) on-chip networks are proposed for solving issues such as large clock tree distribution and signal delay variations. More interestingly, for the GALS networks using m-of-n delay-insensitive interconnect, the asynchronous interconnect not only can be used for on-chip interconnection, but also provides a simple, direct and power-saving solution for off-chip interconnection. This paper presents an asynchronous interface FIFO design to improve throughput over asynchronous inter-chip links using 2-of-7 Non-Return-to-Zero (NRZ) encoding in an existing many-core system. The proposed design is suitable for implementation on commodity FPGAs without using the limited global clock buffer resources, but involves using the FPGA to implement asynchronous circuits. The interface FIFO is constructed from the transition detectors themselves rather than by employing a separate buffer in the more conventional fashion. The proposed solution has been demonstrated in an existing system and is suitable for adaptation to other asynchronous m-of-n NRZ coding protocols for high-throughput communication.

international symposium on circuits and systems | 2017

Live demonstration: Dynamic voltage and frequency scaling for neuromorphic many-core systems

Sebastian Höppner; Yexin Yan; Bernhard Vogginger; Andreas Dixius; Johannes Partzsch; Prateek Joshi; Felix Neumarker; Stephan Hartmann; Stefan Schiefer; Stefan Scholze; Georg Ellguth; Love Cederstroem; Matthias Eberlein; Christian Mayr; Steve Temple; Luis A. Plana; Jim D. Garside; Simon Davison; David R. Lester; Steve B. Furber

We present a dynamic voltage and frequency scaling technique within SoCs for per-core power management: the architecture allows for individual, self triggered performance-level scaling of the processing elements (PEs) within less than 100ns. This technique enables each core to adjust its local supply voltage and frequency depending on its current computational load. A test chip has been implemented in 28nm CMOS technology, as prototype of the SpiNNaker2 neuromorphic many core system, containing 4 PEs which are operational within the range of 1.1V down to 0.7V at frequencies from 666MHz down to 100MHz; The particular domain area of this application specific processor is real-time neuromorphics. Using a standard benchmark — the synfire chain — we show that the total power consumption can be reduced by 45%, with 85% baseline power reduction and a 30% reduction of energy per neuron and synapse computation, all while maintaining biological real-time operation.

international conference on application of concurrency to system design | 2013