Mark Milward
University of Edinburgh
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Mark Milward.
IEEE Transactions on Very Large Scale Integration Systems | 2008
Sami Khawam; Ioannis Nousias; Mark Milward; Ying Yi; Mark Muir; Tughrul Arslan
This paper presents a novel instruction cell-based reconfigurable computing architecture for low-power applications, thereafter referred to as the reconfigurable instruction cell array (RICA). For the development of the RICA, a top-down software driven approach was taken and revealed as one of the key design decisions for a flexible, easy to program, low-power architecture. These features make RICA an architecture that inherently solves the main design requirements of modern low-power devices. Results show that it delivers considerably less power consumption when compared to leading VLIW and low-power digital signal processors, but still maintaining their throughput performance.
design, automation, and test in europe | 2006
Ying Yi; Ioannis Nousias; Mark Milward; Sami Khawam; Tughrul Arslan; Iain Lindsay
This paper presents a new operation chaining reconfigurable scheduling algorithm (CRS) based on list scheduling that maximizes instruction level parallelism available in distributed high performance instruction cell based reconfigurable systems. Unlike other typical scheduling methods, it considers the placement and routing effect, register assignment and advanced operation chaining compilation technique to generate higher performance scheduled code. The effectiveness of this approach is demonstrated here using a recently developed industrial distributed reconfigurable instruction cell based architecture [Lee,2003]. The results show that schedules using this approach achieve equivalent throughput to VLIW architectures but at much lower power consumption
symposium on cloud computing | 2006
Adam Major; Ying Yi; Ioannis Nousias; Mark Milward; Sami Khawam; Tughrul Arslan
This paper presents a new baseline profile compliant H.264 decoder implementation specifically tailored for an ANSI-C programmable, dynamically reconfigurable, instruction cell based architecture which has been developed. We use the ffmpeg libavcodec library as the basis for our decoder and identify the most processor intensive functions. These functions are tailored in a novel framework incorporating established software techniques alongside several architecture specific transforms. Initial results demonstrate that our reconfigurable architecture based decoder provides a significant performance boost with power figures below that of a microcontroller such as ARM.
IEEE Transactions on Very Large Scale Integration Systems | 2008
Nazish Aslam; Mark Milward; Ahmet T. Erdogan; Tughrul Arslan
This paper presents a code compression and on-the-fly decompression scheme suitable for coarse-grain reconfigurable technologies. These systems pose further challenges by having an order of magnitude higher memory requirement due to much wider instruction words than typical VLIW/TTA architectures. Current compression schemes are evaluated. A highly efficient and novel dictionary-based lossless compression technique is implemented and compared against a previous implementation for a reconfigurable system. This paper looks at several conflicting design parameters, such as the compression ratio, silicon area, latency, and power consumption. Compression ratios in the range of 0.32 to 0.44 are recorded with the proposed scheme for a given set of test programs. With these test programs, a 60% overall silicon area saving is achieved, even after the decompressor hardware overhead is taken into account. The proposed technique may be applied to any architecture which exhibits common characteristics to the example reconfigurable architecture targeted in this paper.
international parallel and distributed processing symposium | 2007
Nazish Aslam; Mark Milward; Ioannis Nousias; Tughrul Arslan; Ahmet T. Erdogan
Code compression has been applied to embedded systems to minimize the silicon area utilized for program memories, and lower the power consumption. More recently, it has become a necessity for multiple-issue architectures, such as VLIW and TTA, to permit a viable realization of these designs. In this paper, a code compression and decompression scheme suitable for newly emerging reconfigurable technologies is presented, which pose further challenges by having an order of magnitude higher memory requirement due to much wider instruction words than typical VLIW/TTA architectures. Two dictionary-based lossless compression schemes are implemented and compared for an example reconfigurable system. This paper looks at several conflicting design parameters, such as the compression ratio, silicon area and speed. Test programs for a 2D DCT, minimum error, wimax and H.264 have been evaluated with compression ratios in the range of 41% to 62% recorded with the best scheme.
IEEE Transactions on Parallel and Distributed Systems | 2004
Mark Milward; Jose Luis Nunez; David Mulvaney
Logic density increases have made feasible the implementation of multiprocessor systems able to meet the intensive data processing demands of highly concurrent systems. We describe the research and hardware implementation of a high-performance parallel multicompressor chip. A detailed investigation into the performances of alternative input and output routing strategies for realistic data sets demonstrate that the design of parallel compression devices involves important trade offs that affect compression performance, latency, and throughput. The most promising approach is implemented into FPGA hardware and is shown to provide a scalable compression solution at throughputs able to cope with the demands of modern high-bandwidth applications.
field-programmable custom computing machines | 2007
Nazish Aslam; Mark Milward; Ioannis Nousias; Tughrul Arslan; Ahmet T. Erdogan
This paper presents a code compression and on-the-fly decompression scheme suitable for coarse-grain reconfigurable technologies. A novel unit-grouping dictionary based compression technique utilizing special control bits to increase the effective storage capacity of the dictionaries is implemented and compared against an existing suitable technique for an example reconfigurable system. Compressions ratios in the range of 40%-59% are recorded with new scheme.
asia and south pacific design automation conference | 2005
Ying Yi; Mark Milward; Sami Khawam; Ioannis Nousias; Tughrul Arslan
To date, most high-level synthesis systems do not automatically solve present design problems, such as those related to timing associated with the physical implementation of multirate DSP architectures. Whilst others do not trade off area/speed of algorithm efficiently for such architectures. An automatic synthesis methodology based on both retiming techniques together with folding transformations is presented in this paper in order to solve timing problems associated with the implementation of multirate DSP algorithms. We demonstrate that techniques for modeling computational unit latencies, which can influence parameterisations of a multirate DSP IP core, can lead to highly efficient solutions. This is illustrated using a polyphase IIR IDCT example. Using the folding transformation, the control circuit for a hardware sharing multirate DSP is also presented.
field-programmable logic and applications | 2007
Ioannis Nousias; Sami Khawam; Mark Milward; Ying Yi; Mark Muir; Tughrul Arslan
We present a new de-blocking filter module fully optimised for use on a recently introduced dynamically reconfigurable, instruction cell based architecture. The module consists of a novel combination of standard software transforms alongside architecture specific techniques and aims to reduce reconfiguration overheads and increase utilisation of resources. Our proposed filter outperforms the standard FFMpeg based filter code on the target architecture by 4.5 times.
field-programmable logic and applications | 2007
Ioannis Nousias; Sami Khawam; Mark Milward; Mark Muir; Tughrul Arslan
This paper presents the preliminary results of a physical placement algorithm for heterogeneous Dynamically Reconfigurable Arrays (DRA), based on a multi-objective, multi-threaded GA. The algorithm deals with the spatial and temporal nature of the configurations used in DRAs, in an attempt to find a suitable layout for a wide range of applications, since general applicability is a key criteria for DRAs.