Ioannis Nousias
University of Edinburgh
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Ioannis Nousias.
IEEE Transactions on Very Large Scale Integration Systems | 2008
Sami Khawam; Ioannis Nousias; Mark Milward; Ying Yi; Mark Muir; Tughrul Arslan
This paper presents a novel instruction cell-based reconfigurable computing architecture for low-power applications, thereafter referred to as the reconfigurable instruction cell array (RICA). For the development of the RICA, a top-down software driven approach was taken and revealed as one of the key design decisions for a flexible, easy to program, low-power architecture. These features make RICA an architecture that inherently solves the main design requirements of modern low-power devices. Results show that it delivers considerably less power consumption when compared to leading VLIW and low-power digital signal processors, but still maintaining their throughput performance.
design, automation, and test in europe | 2006
Ying Yi; Ioannis Nousias; Mark Milward; Sami Khawam; Tughrul Arslan; Iain Lindsay
This paper presents a new operation chaining reconfigurable scheduling algorithm (CRS) based on list scheduling that maximizes instruction level parallelism available in distributed high performance instruction cell based reconfigurable systems. Unlike other typical scheduling methods, it considers the placement and routing effect, register assignment and advanced operation chaining compilation technique to generate higher performance scheduled code. The effectiveness of this approach is demonstrated here using a recently developed industrial distributed reconfigurable instruction cell based architecture [Lee,2003]. The results show that schedules using this approach achieve equivalent throughput to VLIW architectures but at much lower power consumption
adaptive hardware and systems | 2006
Ioannis Nousias; Tughrul Arslan
This paper presents a new approach in realizing virtual channels tailored for network on chip implementations. The technique makes use of a flow control mechanism based on adaptive input rate control where the required buffer size is independent of the number of channels and the packet size. The resulting implementation requires only 3% of the memory space used in a conventional implementation of virtual channels. The efficient use of memory storage does also deliver performance improvements that can be up to 15% for a normal network configuration
field-programmable custom computing machines | 2007
Han Wei; Mark Muir; Ioannis Nousias; Tughrul Arslan; Ahmet T. Erdogan
This paper presents the porting of an RTOS Micro C/OS-II on a novel reconfigurable instruction cell based architecture which fills the gap between DSP, FPGA and ASIC with high performance, high flexibility and ANSI-C support. WiMAX physical layer program has been implemented on the target architecture with the RTOS support. A semaphore based synchronization scheme is used to improve the task independence. The research lays a foundation for further exploration of multithreading on multiple target architectures.
symposium on cloud computing | 2006
Adam Major; Ying Yi; Ioannis Nousias; Mark Milward; Sami Khawam; Tughrul Arslan
This paper presents a new baseline profile compliant H.264 decoder implementation specifically tailored for an ANSI-C programmable, dynamically reconfigurable, instruction cell based architecture which has been developed. We use the ffmpeg libavcodec library as the basis for our decoder and identify the most processor intensive functions. These functions are tailored in a novel framework incorporating established software techniques alongside several architecture specific transforms. Initial results demonstrate that our reconfigurable architecture based decoder provides a significant performance boost with power figures below that of a microcontroller such as ARM.
symposium on application specific processors | 2008
Wei Han; Ying Yi; Mark Muir; Ioannis Nousias; Tughrul Arslan; Ahmet T. Edorgan
Wireless internet access technologies have significant market potential, especially the WiMAX protocol which can offer data rate of tens of Mbps. A significant demand for embedded high performance WiMAX solutions is forcing designers to seek single-chip multiprocessor or multi-core systems that offer competitive advantages in terms of all performance metrics, such as speed, power and area. Through the provision of a degree of flexibility similar to that of a DSP and performance and power consumption advantages approaching that of an ASIC, emerging dynamically reconfigurable processors are proving to be strong candidates for future high performance multi-core processor systems. This paper presents several new single-chip multi-core architectures, based on newly emerging dynamically reconfigurable processor cores, for the WiMAX physical layer. A simulation platform is proposed in order to explore and implement various multi-core solutions combining different memory architectures and task partitioning schemes. The paper describes the architectures, the simulation environment, and demonstrates that up to 4.2x speedup can be achieved by employing four dynamically reconfigurable processor cores with individual local memory units.
international parallel and distributed processing symposium | 2007
Nazish Aslam; Mark Milward; Ioannis Nousias; Tughrul Arslan; Ahmet T. Erdogan
Code compression has been applied to embedded systems to minimize the silicon area utilized for program memories, and lower the power consumption. More recently, it has become a necessity for multiple-issue architectures, such as VLIW and TTA, to permit a viable realization of these designs. In this paper, a code compression and decompression scheme suitable for newly emerging reconfigurable technologies is presented, which pose further challenges by having an order of magnitude higher memory requirement due to much wider instruction words than typical VLIW/TTA architectures. Two dictionary-based lossless compression schemes are implemented and compared for an example reconfigurable system. This paper looks at several conflicting design parameters, such as the compression ratio, silicon area and speed. Test programs for a 2D DCT, minimum error, wimax and H.264 have been evaluated with compression ratios in the range of 41% to 62% recorded with the best scheme.
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems | 2009
Wei Han; Ying Yi; Mark Muir; Ioannis Nousias; Tughrul Arslan; Ahmet T. Erdogan
Wireless Internet-access technologies have significant market potential, particularly the Worldwide Interoperability for Microwave Access (WiMAX) protocol which can offer data rates of tens of megabits per second. A significant demand for embedded high-performance WiMAX solutions is forcing designers to seek single-chip multicore systems that offer competitive advantages in terms of all performance metrics, such as speed, power, and area. Through the provision of a degree of flexibility similar to that of a DSP and performance and power consumption advantages approaching that of an application-specific integrated circuit, emerging dynamically reconfigurable (DR) processors are proving to be strong candidates for processing cores in future high-performance multicore-processor systems. This paper presents several new single-chip multicore architectures for the WiMAX application based on recently emerging coarse-grained DR processor cores. A simulation platform is proposed in order to explore and implement various multicore solutions combining different memory architectures and task-partitioning schemes. This paper describes the different architectures, the simulation environment, and several task-partitioning methods and demonstrates that up to 7.3 and 12 times speedup can be achieved by employing eight and ten DR processor cores for both the WiMAX transmitter and receiver sections, respectively. A comparison with other WiMAX multicore solutions is given in order to demonstrate that our best solution delivers a high throughput at relatively low area cost.
asia and south pacific design automation conference | 2005
Adeoye Olugbon; Sami Khawam; Tughrul Arslan; Ioannis Nousias; Iain Lindsay
We propose a system-on-chip (SoC) architecture for reconfigurable applications based on the AMBA high-speed bus (AHB). The architecture features multiple low-area flyby DMA blocks for transferring configuration data. Furthermore, the architecture eliminates the use of energy-consuming instructions used in comparable commercial reconfigurable SoCs. The flyby DMA blocks achieve a reduction of up to 98% in the number of gates found in general-purpose DMA controllers. The DMA blocks also achieve the flyby throughput which halves the number of clock cycles used in conventional DMA for data transfer. We also demonstrate the presence of parallel processing which contributes to improved system performance of the proposed architecture over commercial comparatives.
field-programmable custom computing machines | 2007
Nazish Aslam; Mark Milward; Ioannis Nousias; Tughrul Arslan; Ahmet T. Erdogan
This paper presents a code compression and on-the-fly decompression scheme suitable for coarse-grain reconfigurable technologies. A novel unit-grouping dictionary based compression technique utilizing special control bits to increase the effective storage capacity of the dictionaries is implemented and compared against an existing suitable technique for an example reconfigurable system. Compressions ratios in the range of 40%-59% are recorded with new scheme.