Malcolm Dwyer | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Malcolm Dwyer is active.

Explore More

Publication

Featured researches published by Malcolm Dwyer.

international parallel and distributed processing symposium | 2006

FPGA implementation of a license plate recognition SoC using automatically generated streaming accelerators

Nikolaos Bellas; Sek M. Chai; Malcolm Dwyer; Dan Linzmeier

Modern FPGA platforms provide the hardware and software infrastructure for building a bus-based system on chip (SoC) that meet the applications requirements. The designer can customize the hardware by selecting from a large number of pre-defined peripherals and fixed IP functions and by providing new hardware, typically expressed using RTL. Hardware accelerators that provide application-specific extensions to the computational capabilities of a system is an efficient mechanism to enhance the performance and reduce the power dissipation. What is missing is an integrated approach to identify the computationally critical parts of the application and to create accelerators starting from a high level representation with a minimal design effort. In this paper, we present an automation methodology and a tool that generates accelerators. We apply the methodology on an FPGA-based license plate recognition (LPR) system used in law enforcement. The accelerators process streaming data and support a programming model which can naturally express a large number of embedded applications resulting in efficient hardware implementations. We show that we can achieve an overall LPR application speed up from 1.2times to 2.6times, thus enabling real-time functionality under realistic road scenes

field-programmable custom computing machines | 2009

Real-Time Fisheye Lens Distortion Correction Using Automatically Generated Streaming Accelerators

Nikolaos Bellas; Sek M. Chai; Malcolm Dwyer; Dan Linzmeier

Fisheye lenses are often used in scientific or virtual reality applications to enlarge the field of view of a conventional camera. Fisheye lens distortion correction is an image processing application which transforms the distorted fisheye images back to the natural-looking perspective space. This application is characterized by non-linear streaming memory access patterns that make main memory bandwidth a key performance limiter. We have developed a fisheye lens distortion correction system on a custom board that includes a Xilinx Virtex-4 FPGA. We express the application in a high level streaming language, and we utilize Proteus, an architectural synthesis tool, to quickly explore the design space and generate the streaming accelerator best suited for our cost and performance constraints. This paper shows that appropriate ESL tools enable rapid prototyping and design of real-life, performance critical and cost sensitive systems with complex memory access patterns and hardware-software interaction mechanisms.

field-programmable custom computing machines | 2006

Template-Based Generation of Streaming Accelators from a High Level Presentation

Nikolaos Bellas; Sek M. Chai; Malcolm Dwyer; Dan Linzmeier

A design methodology and prototype tool to automate the design and architectural exploration of hardware accelerators are described in this paper. In comparison to other approaches, we utilize a well-engineered template to enable fast convergence to an area and speed efficient design. We show how this methodology is used for an application set with various architectural configurations

international parallel and distributed processing symposium | 2007

An Architectural Framework for Automated Streaming Kernel Selection

Nikolaos Bellas; Sek M. Chai; Malcolm Dwyer; Dan Linzmeier

Hardware accelerators are increasingly used to extend the computational capabilities of baseline scalar processors to meet the growing performance and power requirements of embedded applications. The challenge to the designer is the extensive human effort required to identify the appropriate kernels to be mapped to gates and to implement a network of accelerators to execute the kernels. In this paper, we present a methodology to automate the selection of streaming kernels in a reconfigurable platform based on the characteristics of the application. The methodology is based on a flow graph that describes the streaming computations and communications. The flow graph is used to efficiently identify the most profitable subset of streaming kernels that optimize performance without exceeding the available area of the reconfigurable fabric.

ACM Sigarch Computer Architecture News | 2007

Mapping streaming architectures on reconfigurable platforms

Nikolaos Bellas; Sek M. Chai; Malcolm Dwyer; Dan Linzmeier

Hardware accelerators, used as application-specific extensions to the computational capabilities of a system, are efficient mechanisms to enhance the performance and reduce the power dissipation in a System On Chip (SoC). These accelerators execute on the computationally critical part of the application, and offload computations from the scalar processors. In this paper, we present a design automation tool that generates accelerators based on a given application kernel. The accelerators are processing streaming data, and support a programming model which can naturally express a large number of embedded applications, and which results in efficient and fast hardware implementations. We demonstrate the applicability of the tool for architectural space exploration for a number of media applications, with results on area, throughput, and clock speeds.

computer vision and pattern recognition | 2006

Reconfigurable Streaming Architectures for Embedded Smart Cameras

Sek M. Chai; Nikolaos Bellas; Greg Kujawa; Tom Ziomek; Linda M. Dawson; Tony Scaminaci; Malcolm Dwyer; Dan Linzmeier

Smart cameras using FPGAs require an automation method to simplify the design process and to ensure both computation and memory performance are met. Reconfigurable logic allows exploration of different hardware accelerators and memory-hierarchy configurations based on application needs. This paper presents a streaming architecture template that is generated from high level program descriptions. A smart camera development platform, the software architecture, and demonstration template are also described.

international conference on multimedia and expo | 2003

A programmable, high performance vector array unit used for real-time motion estimation

Nikolaos Bellas; Malcolm Dwyer

The MPEG-4 and H.263 video standards are enabling technologies for the proliferation of wireless multimedia applications in 3G systems. For video encoding, the motion estimation (ME) stage is typically the most demanding in terms of performance and bandwidth requirements, and is usually implemented through dedicated hardware, especially in systems with stringent power requirements. This approach, however, cannot exploit any algorithm advances on motion estimation algorithms, and requires major hardware re-design in case of modified specifications or standards. This paper describes the architecture of a programmable motion estimation unit that is used as part of a larger wireless video encoding system. An instruction set architecture (ISA) allows the development of various ME algorithms in software without the need to re-design portion of the chip.

field-programmable logic and applications | 2009

Proteus: An architectural synthesis tool based on the stream programming paradigm

Nikolaos Bellas; Sek M. Chai; Malcolm Dwyer; Dan Linzmeier; Abelardo López-Lagunas

The problem of automatically generating hardware modules from a high level representation of an application has been at the forefront of EDA research in the last few years. Such an EDA methodology would potentially enable the large pool of software engineers and algorithm IP experts without architectural and hardware expertise to design and implement platform systems, thus dramatically reducing time to market. This paper makes the argument that such a methodology requires a programming model beyond the sequential semantics of languages like C/C++. We argue in favor of the streaming programming model in which computation and data communication are explicitly separated and optimized. Our architectural synthesis tool, Proteus, processes stream programs that partition the application into a series of streaming kernels that operate on streams of data elements. Proteus produces efficient hardware accelerators that provide orders of magnitude higher throughput than a software implementation, at an area cost very close to manual HDL implementation.

Archive | 2001