Robert G. Dimond | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Robert G. Dimond is active.

Explore More

Publication

Featured researches published by Robert G. Dimond.

field-programmable logic and applications | 2005

CUSTARD - a customisable threaded FPGA soft processor and tools

Robert G. Dimond; Oskar Mencer; Wayne Luk

We propose CUSTARD - customisable threaded architecture - a soft processor design space that combines support for multiple hardware threads and automatically generated custom instructions. Multiple threads incur low additional hardware cost and allow fine-grained concurrency without multiple processor cores or software overhead. Custom instructions, generated for a specific application, accelerate frequently performed computations by implementing them as dedicated hardware. In this paper we present a flexible processor and compiler generation system, FPGA implementations of CUSTARD and performance/area results for media and cryptography benchmarks.

design, automation, and test in europe | 2007

Optimizing instruction-set extensible processors under data bandwidth constraints

Kubilay Atasu; Robert G. Dimond; Oskar Mencer; Wayne Luk; Can C. Özturan; G. Diindar

The authors present a methodology for generating optimized architectures for data bandwidth constrained extensible processors. The authors describe a scalable integer linear programming (ILP) formulation, that extracts the most profitable set of instruction-set extensions given the available data bandwidth and transfer latency. Unlike previous approaches, the authors differentiate between number of inputs and outputs for instruction-set extensions and the number of register file ports. This differentiation makes the approach applicable to architectures that include architecturally visible state registers and dedicated data transfer channels. The authors support a comprehensive design space exploration to characterize the area/performance trade-offs for various applications. The authors evaluate our approach using actual ASIC implementations to demonstrate that our automatically customized processors meet timing within the target silicon area. For an embedded processor with only two register read ports and one register write port, the authors obtain up to 4.3times speed-up with extensions incurring only a 35% area overhead

IEEE Transactions on Parallel and Distributed Systems | 2013

Finite-Difference Wave Propagation Modeling on Special-Purpose Dataflow Machines

Oliver Pell; Jacob A. Bower; Robert G. Dimond; Oskar Mencer; Michael J. Flynn

Modeling wave propagation through the earth is an important application in geoscience. We present a framework for wave propagation modeling on special-purpose hardware, which dramatically improves the application performance compared to conventional CPUs. We utilize custom hardware platforms consisting of a mix of x86 CPUs and dataflow engines connected by high-bandwidth communication links. Application programmers describe their algorithms in a domain specific language using Java syntax, with special dataflow semantics overlayed on top of the Java language. The application-specific dataflow engines run at hundreds of MHz with massive parallelism and deliver high performance/Watt, up to 30 times more energy efficient than conventional CPUs. The power efficiency of this approach suggests that dataflow computing may have a key role to play in the improvements in power efficiency necessary to reach exascale computing.

international symposium on parallel and distributed computing | 2008

Finding Speedup in Parallel Processors

Michael J. Flynn; Robert G. Dimond; Oskar Mencer; Oliver Pell

While recently the focus of architects and programmers has been on multi core, the alternative of processor node plus array oriented accelerator has some significant advantages especially in compute intensive static applications. We propose an acceleration methodology based on FPGA arrays (but, in principle it could be GPU or Cell based). The methodology uses a comprehensive application analysis supported by high performance FPGA hardware. The analysis provides a dataflow graph of the application which is replicated in SIMD for multiple data strips until limited by the pin bandwidth, then pipelined (MISD) until circuit limited. An oil exploration application shows the possibility of speedup of over 300x over an Intel Xeon.

field-programmable custom computing machines | 2006

Combining Instruction Coding and Scheduling to Optimize Energy in System-on-FPGA

Robert G. Dimond; Oskar Mencer; Wayne Luk

In this paper, we investigate a combination of two techniques n struction coding and instruction re-ordering - for optimizing energy in embedded processor control. We present the first practical, hardware implementation incorporating both approaches as part of a novel flow for automatic power-optimization of an FPGA soft processor. Our infrastructure generates customized processors and associated software, to enable power optimizations to be evaluated on multiple architectures and FPGA platforms. We evaluate using both software estimates of power and actual measurements from both low-cost and high-performance FPGAs. We generate over 150 optimized processor designs for two FPGA platforms, two processor architectures and six different benchmarks at four different clock rates and achieve consistent measured dynamic power reduction of up to 74%, without performance cost. Our results are applicable beyond processor optimization, quantifying the benefits of practical switching reduction and highlighting non-obvious pitfalls and complexities in dynamic power optimization

design, automation, and test in europe | 2004

Customisable EPIC processor: architecture and tools

W. W. S. Chu; Robert G. Dimond; S. Perrott; S. P. Seng; Wayne Luk

This paper describes a customisable architecture and the associated tools for a prototype EPIC (explicitly parallel instruction computing) processor. Possible customisations include varying the number of registers and functional units, which are specified at compile-time. This facilitates the exploration of performance/area trade-off for different EPIC implementations. We describe the tools for this EPIC processor, which include a compiler and an assembler based on the trimaran framework. Various pipelined EPIC designs have been implemented using field programmable gate arrays (FPGAs); the one with 4 ALUs at 41.8 MHz can run a DCT application 5 times faster than the strongARM SA-110 processor at 100 MHz.

design, automation, and test in europe | 2006

Automating Processor Customisation: Optimised Memory Access and Resource Sharing

Robert G. Dimond; Oskar Mencer; Wayne Luk

We propose a novel methodology to generate application specific instruction processors (ASIPs) including custom instructions. Our implementation balances performance and area requirements by making custom instructions reusable across similar pieces of code. In addition to arithmetic and logic operations, table look-ups within custom instructions reduce costly accesses to global memory. We present synthesis and cycle-accurate simulation results for six embedded benchmarks running on customised processors. Reusable custom instructions achieve an average 319% speedup with only 5% additional area. The maximum speedup of 501% for the advanced encryption standard (AES) requires only 3.6% additional area

IEE Proceedings - Computers and Digital Techniques | 2006