Mark Milward | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Mark Milward is active.

Explore More

Publication

Featured researches published by Mark Milward.

IEEE Transactions on Very Large Scale Integration Systems | 2008

The Reconfigurable Instruction Cell Array

Sami Khawam; Ioannis Nousias; Mark Milward; Ying Yi; Mark Muir; Tughrul Arslan

This paper presents a novel instruction cell-based reconfigurable computing architecture for low-power applications, thereafter referred to as the reconfigurable instruction cell array (RICA). For the development of the RICA, a top-down software driven approach was taken and revealed as one of the key design decisions for a flexible, easy to program, low-power architecture. These features make RICA an architecture that inherently solves the main design requirements of modern low-power devices. Results show that it delivers considerably less power consumption when compared to leading VLIW and low-power digital signal processors, but still maintaining their throughput performance.

design, automation, and test in europe | 2006

System-level Scheduling on Instruction Cell Based Reconfigurable Systems

Ying Yi; Ioannis Nousias; Mark Milward; Sami Khawam; Tughrul Arslan; Iain Lindsay

This paper presents a new operation chaining reconfigurable scheduling algorithm (CRS) based on list scheduling that maximizes instruction level parallelism available in distributed high performance instruction cell based reconfigurable systems. Unlike other typical scheduling methods, it considers the placement and routing effect, register assignment and advanced operation chaining compilation technique to generate higher performance scheduled code. The effectiveness of this approach is demonstrated here using a recently developed industrial distributed reconfigurable instruction cell based architecture [Lee,2003]. The results show that schedules using this approach achieve equivalent throughput to VLIW architectures but at much lower power consumption

symposium on cloud computing | 2006

H.264 Decoder Implementation on a Dynamically Reconfigurable Instruction Cell Based Architecture

Adam Major; Ying Yi; Ioannis Nousias; Mark Milward; Sami Khawam; Tughrul Arslan

This paper presents a new baseline profile compliant H.264 decoder implementation specifically tailored for an ANSI-C programmable, dynamically reconfigurable, instruction cell based architecture which has been developed. We use the ffmpeg libavcodec library as the basis for our decoder and identify the most processor intensive functions. These functions are tailored in a novel framework incorporating established software techniques alongside several architecture specific transforms. Initial results demonstrate that our reconfigurable architecture based decoder provides a significant performance boost with power figures below that of a microcontroller such as ARM.

IEEE Transactions on Very Large Scale Integration Systems | 2008

Code Compression and Decompression for Coarse-Grain Reconfigurable Architectures

Nazish Aslam; Mark Milward; Ahmet T. Erdogan; Tughrul Arslan

This paper presents a code compression and on-the-fly decompression scheme suitable for coarse-grain reconfigurable technologies. These systems pose further challenges by having an order of magnitude higher memory requirement due to much wider instruction words than typical VLIW/TTA architectures. Current compression schemes are evaluated. A highly efficient and novel dictionary-based lossless compression technique is implemented and compared against a previous implementation for a reconfigurable system. This paper looks at several conflicting design parameters, such as the compression ratio, silicon area, latency, and power consumption. Compression ratios in the range of 0.32 to 0.44 are recorded with the proposed scheme for a given set of test programs. With these test programs, a 60% overall silicon area saving is achieved, even after the decompressor hardware overhead is taken into account. The proposed technique may be applied to any architecture which exhibits common characteristics to the example reconfigurable architecture targeted in this paper.

international parallel and distributed processing symposium | 2007

Code Compression and Decompression for Instruction Cell Based Reconfigurable Systems

Nazish Aslam; Mark Milward; Ioannis Nousias; Tughrul Arslan; Ahmet T. Erdogan

Code compression has been applied to embedded systems to minimize the silicon area utilized for program memories, and lower the power consumption. More recently, it has become a necessity for multiple-issue architectures, such as VLIW and TTA, to permit a viable realization of these designs. In this paper, a code compression and decompression scheme suitable for newly emerging reconfigurable technologies is presented, which pose further challenges by having an order of magnitude higher memory requirement due to much wider instruction words than typical VLIW/TTA architectures. Two dictionary-based lossless compression schemes are implemented and compared for an example reconfigurable system. This paper looks at several conflicting design parameters, such as the compression ratio, silicon area and speed. Test programs for a 2D DCT, minimum error, wimax and H.264 have been evaluated with compression ratios in the range of 41% to 62% recorded with the best scheme.

IEEE Transactions on Parallel and Distributed Systems | 2004

Design and implementation of a lossless parallel high-speed data compression system

Mark Milward; Jose Luis Nunez; David Mulvaney

Logic density increases have made feasible the implementation of multiprocessor systems able to meet the intensive data processing demands of highly concurrent systems. We describe the research and hardware implementation of a high-performance parallel multicompressor chip. A detailed investigation into the performances of alternative input and output routing strategies for realistic data sets demonstrate that the design of parallel compression devices involves important trade offs that affect compression performance, latency, and throughput. The most promising approach is implemented into FPGA hardware and is shown to provide a scalable compression solution at throughputs able to cope with the demands of modern high-bandwidth applications.

field-programmable custom computing machines | 2007

Code Compressor and Decompressor for Ultra Large Instruction Width Coarse-Grain Reconfigurable Systems

Nazish Aslam; Mark Milward; Ioannis Nousias; Tughrul Arslan; Ahmet T. Erdogan

This paper presents a code compression and on-the-fly decompression scheme suitable for coarse-grain reconfigurable technologies. A novel unit-grouping dictionary based compression technique utilizing special control bits to increase the effective storage capacity of the dictionaries is implemented and compared against an existing suitable technique for an example reconfigurable system. Compressions ratios in the range of 40%-59% are recorded with new scheme.

asia and south pacific design automation conference | 2005

Automatic synthesis and scheduling of multirate DSP algorithms

Ying Yi; Mark Milward; Sami Khawam; Ioannis Nousias; Tughrul Arslan

To date, most high-level synthesis systems do not automatically solve present design problems, such as those related to timing associated with the physical implementation of multirate DSP architectures. Whilst others do not trade off area/speed of algorithm efficiently for such architectures. An automatic synthesis methodology based on both retiming techniques together with folding transformations is presented in this paper in order to solve timing problems associated with the implementation of multirate DSP algorithms. We demonstrate that techniques for modeling computational unit latencies, which can influence parameterisations of a multirate DSP IP core, can lead to highly efficient solutions. This is illustrated using a polyphase IIR IDCT example. Using the folding transformation, the control circuit for a hardware sharing multirate DSP is also presented.

field-programmable logic and applications | 2007

H.264/AVC In-Loop De-Blocking Filter Targeting a Dynamically Reconfigurable Instruction Cell Based Architecture

Ioannis Nousias; Sami Khawam; Mark Milward; Ying Yi; Mark Muir; Tughrul Arslan

We present a new de-blocking filter module fully optimised for use on a recently introduced dynamically reconfigurable, instruction cell based architecture. The module consists of a novel combination of standard software transforms alongside architecture specific techniques and aims to reduce reconfiguration overheads and increase utilisation of resources. Our proposed filter outperforms the standard FFMpeg based filter code on the target architecture by 4.5 times.

field-programmable logic and applications | 2007

A Multi Objective GA Based Physical Placement Algorithm for Heterogeneous Dynamically Reconfigurable Arrays

Ioannis Nousias; Sami Khawam; Mark Milward; Mark Muir; Tughrul Arslan

This paper presents the preliminary results of a physical placement algorithm for heterogeneous Dynamically Reconfigurable Arrays (DRA), based on a multi-objective, multi-threaded GA. The algorithm deals with the spatial and temporal nature of the configurations used in DRAs, in an attempt to find a suitable layout for a wide range of applications, since general applicability is a key criteria for DRAs.

Explore More