Sami Khawam
University of Edinburgh
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Sami Khawam.
IEEE Transactions on Very Large Scale Integration Systems | 2008
Sami Khawam; Ioannis Nousias; Mark Milward; Ying Yi; Mark Muir; Tughrul Arslan
This paper presents a novel instruction cell-based reconfigurable computing architecture for low-power applications, thereafter referred to as the reconfigurable instruction cell array (RICA). For the development of the RICA, a top-down software driven approach was taken and revealed as one of the key design decisions for a flexible, easy to program, low-power architecture. These features make RICA an architecture that inherently solves the main design requirements of modern low-power devices. Results show that it delivers considerably less power consumption when compared to leading VLIW and low-power digital signal processors, but still maintaining their throughput performance.
adaptive hardware and systems | 2006
Balal Ahmad; Ahmet T. Erdogan; Sami Khawam
This paper describes the architecture of our dynamically reconfigurable network-on-chip (NoC) architecture that has been proposed for reconfigurable multiprocessor system-on-chip (MPSoC), as a solution to the increased communication needs, low silicon cost, quality of service and scalability of network in mind. The novelty of the proposed NoC lies in the fact that it dynamically configures itself with respect to routing, switching and data packet size with the changing communication requirements of the system at run time, thus aiming to provide low latency, low power and high data throughput. Simulation results and a prototype implementation of the idea have shown its efficiency when simulated under different traffic condition at a negligible area overhead
design, automation, and test in europe | 2006
Ying Yi; Ioannis Nousias; Mark Milward; Sami Khawam; Tughrul Arslan; Iain Lindsay
This paper presents a new operation chaining reconfigurable scheduling algorithm (CRS) based on list scheduling that maximizes instruction level parallelism available in distributed high performance instruction cell based reconfigurable systems. Unlike other typical scheduling methods, it considers the placement and routing effect, register assignment and advanced operation chaining compilation technique to generate higher performance scheduled code. The effectiveness of this approach is demonstrated here using a recently developed industrial distributed reconfigurable instruction cell based architecture [Lee,2003]. The results show that schedules using this approach achieve equivalent throughput to VLIW architectures but at much lower power consumption
international parallel and distributed processing symposium | 2004
Sami Khawam; Tughrul Arslan; Fred Westall
Summary form only given. Domain-specific reconfigurable arrays are embedded arrays optimized for one domain of applications providing performance improvements over generic embedded field programmable gate arrays (FPGAs). An embedded reconfigurable array that targets distributed arithmetic (DA) implementations is presented. DA includes calculations that are commonly found in multimedia applications, such as filtering and discrete cosine transform (DCT). Two benchmark DCT circuits are implemented on the array, on conventional FPGAs and on hardwired cores. The performance measured shows considerable improvements in area, power consumption and timing when comparing the presented array with FPGAs. Experimental results are provided which demonstrate the suitability of our architecture in low-power system-on-chip platforms targeting portable mobile devices.
symposium on cloud computing | 2006
Adam Major; Ying Yi; Ioannis Nousias; Mark Milward; Sami Khawam; Tughrul Arslan
This paper presents a new baseline profile compliant H.264 decoder implementation specifically tailored for an ANSI-C programmable, dynamically reconfigurable, instruction cell based architecture which has been developed. We use the ffmpeg libavcodec library as the basis for our decoder and identify the most processor intensive functions. These functions are tailored in a novel framework incorporating established software techniques alongside several architecture specific transforms. Initial results demonstrate that our reconfigurable architecture based decoder provides a significant performance boost with power figures below that of a microcontroller such as ARM.
design, automation, and test in europe | 2004
Sami Khawam; Sajid Baloch; Arjun Pai; Imran Ahmed; Nizamettin Aydin; Tughrul Arslan; Fred Westall
Mobile video processing as defined in standards like MPEG-4 and H.263 contains a number of timeconsuming computations that cannot be efficiently executed on current hardware architectures. The authors recently introduced a reconfigurable SoC platform that permits a low-power, high-throughput and flexible implementation of the motion estimation and DCT algorithms. The computations are done using domainspecific reconfigurable arrays that have demonstrated up to 75% reduction in power consumption when compared to generic FPGA architecture, which makes them suitable for portable devices. This paper presents and compares different configurations of the arrays to efficiently implementing DCT and motion estimation algorithms. A number of algorithms are mapped into the various reconfigurable fabrics demonstrating the flexibility of the new reconfigurable SoC architecture and its ability to support a number of implementations having different performance characteristics.
asia and south pacific design automation conference | 2005
Adeoye Olugbon; Sami Khawam; Tughrul Arslan; Ioannis Nousias; Iain Lindsay
We propose a system-on-chip (SoC) architecture for reconfigurable applications based on the AMBA high-speed bus (AHB). The architecture features multiple low-area flyby DMA blocks for transferring configuration data. Furthermore, the architecture eliminates the use of energy-consuming instructions used in comparable commercial reconfigurable SoCs. The flyby DMA blocks achieve a reduction of up to 98% in the number of gates found in general-purpose DMA controllers. The DMA blocks also achieve the flyby throughput which halves the number of clock cycles used in conventional DMA for data transfer. We also demonstrate the presence of parallel processing which contributes to improved system performance of the proposed architecture over commercial comparatives.
asia and south pacific design automation conference | 2005
Zhenyu Liu; Tughrul Arslan; Sami Khawam; Iain Lindsay
The use of synthesizable reconfigurable cores in system on chip (SoC) designs is increasingly becoming a trend. Such domain-special cores are being used for their flexibility, powerful function and low power consumption. A reconfigurable finite state machine (FSM) is constantly required for the purpose of control in any reconfigurable SoC. This paper presents a novel unbalanced unsymmetrical reconfigurable architecture for generic FSM; compared with commercial FPGA devices, the new architecture results in area reduction of 43% and power consumption decrease of 82%.
field-programmable technology | 2004
Cheng Zhan; Sami Khawam; Tughrul Arslan
This work presents a novel embedded reconfigurable fabric targeting efficient implementation of the Viterbi decoder within a system-on-chip device. The proposed reconfigurable fabric can support constraint lengths ranging from 3 to 9, and code rates in the range 1/2-1/3.Our results demonstrate that this novel architecture has superior throughput and power consumption characteristics when compared to generic DSPs and FPGAs respectively.
adaptive hardware and systems | 2006
Wing On Fung; Tughrul Arslan; Sami Khawam
Domain-specific reconfigurable arrays have shown to provide an efficient trade-off between flexibility of FPGA and performance of ASIC circuit. Nonetheless, the design of these heterogeneous arrays is a labour intensive process. Furthermore, the manual creation of the array architecture could not have been fully optimised, hence limiting their performance. This paper presents a placement technique for mapping logic elements into heterogeneous reconfigurable arrays. At its core, it implements a genetic algorithm, which was used to reduce the span of all the interconnections as well as critical delay. It therefore minimises the amount of routing resource required in the architecture. The algorithm was tested on two arrays implementing DCT and speech coding. The resulting architecture achieves optimal close to that of an expert designer in a fraction of the time