Chrysostomos Nicopoulos
University of Cyprus
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Chrysostomos Nicopoulos.
international symposium on computer architecture | 2006
Feihui Li; Chrysostomos Nicopoulos; Thomas Richardson; Yuan Xie; Vijaykrishnan Narayanan; Mahmut T. Kandemir
Long interconnects are becoming an increasingly important problem from both power and performance perspectives. This motivates designers to adopt on-chip network-based communication infrastructures and three-dimensional (3D) designs where multiple device layers are stacked together. Considering the current trends towards increasing use of chip multiprocessing, it is timely to consider 3D chip multiprocessor design and memory networking issues, especially in the context of data management in large L2 caches. The overall goal of this paper is to study the challenges for L2 design and management in 3D chip multiprocessors. Our first contribution is to propose a router architecture and a topology design that makes use of a network architecture embedded into the L2 cache memory. Our second contribution is to demonstrate, through extensive experiments, that a 3D L2 memory architecture generates much better results than the conventional two-dimensional (2D) designs under different number of layers and vertical (inter-wafer) connections. In particular, our experiments show that a 3D architecture with no dynamic data migration generates better performance than a 2D architecture that employs data migration. This also helps reduce power consumption in L2 due to a reduced number of data movements.
international symposium on computer architecture | 2007
Jongman Kim; Chrysostomos Nicopoulos; Dongkook Park; Reetuparna Das; Yuan Xie; Vijaykrishnan Narayanan; Mazin S. Yousif; Chita R. Das
Much like multi-storey buildings in densely packed metropolises, three-dimensional (3D) chip structures are envisioned as a viable solution to skyrocketing transistor densities and burgeoning die sizes in multi-core architectures. Partitioning a larger die into smaller segments and then stacking them in a 3D fashion can significantly reduce latency and energy consumption. Such benefits emanate from the notion that inter-wafer distances are negligible compared to intra-wafer distances. This attribute substantially reduces global wiring length in 3D chips. The work in this paper integrates the increasingly popular idea of packet-based Networks-on-Chip (NoC) into a 3D setting. While NoCs have been studied extensively in the 2D realm, the microarchitectural ramifications of moving into the third dimension have yet to be fully explored. This paper presents a detailed exploration of inter-strata communication architectures in 3D NoCs. Three design options are investigated; a simple bus-based inter-wafer connection, a hop-by-hop standard 3D design, and a full 3D crossbar implementation. In this context, we propose a novel partially-connected 3D crossbar structure, called the 3D Dimensionally-Decomposed (DimDe) Router, which provides a good tradeoff between circuit complexity and performance benefits. Simulation results using (a) a stand-alone cycle-accurate 3D NoC simulator running synthetic workloads, and (b) a hybrid 3D NoC/cache simulation environment running real commercial and scientific benchmarks, indicate that the proposed DimDe design provides latency and throughput improvements of over 20% on average over the other 3D architectures, while remaining within 5% of the full 3D crossbar performance. Furthermore, based on synthesized hardware implementations in 90 nm technology, the DimDe architecture outperforms all other designs -- including the full 3D crossbar -- by an average of 26% in terms of the Energy-Delay Product (EDP).
dependable systems and networks | 2006
Dongkook Park; Chrysostomos Nicopoulos; Jongman Kim; Narayanan Vijaykrishnan; Chita R. Das
The advent of deep sub-micron technology has exacerbated reliability issues in on-chip interconnects. In particular, single event upsets, such as soft errors, and hard faults are rapidly becoming a force to be reckoned with. This spiraling trend highlights the importance of detailed analysis of these reliability hazards and the incorporation of comprehensive protection measures into all network-on-chip (NoC) designs. In this paper, we examine the impact of transient failures on the reliability of on-chip interconnects and develop comprehensive counter-measures to either prevent or recover from them. In this regard, we propose several novel schemes to remedy various kinds of soft error symptoms, while keeping area and power overhead at a minimum. Our proposed solutions are architected to fully exploit the available infrastructures in an NoC and enable versatile reuse of valuable resources. The effectiveness of the proposed techniques has been validated using a cycle-accurate simulator
international conference on vlsi design | 2006
Thomas D. Richardson; Chrysostomos Nicopoulos; Dongkook Park; Vijaykrishnan Narayanan; Yuan Xie; Chita R. Das; Vijay Degalahal
The two dominant architectural choices for implementing efficient communication fabrics for SoCs have been transaction-based buses and packet-based networks-on-chip (NoC). Both implementations have some inherent disadvantages - the former resulting from poor scalability and the transactional character of their operation, and the latter from inconsistent access times and deterioration of performance at high injection rates. In this paper, we propose a transaction-less, time-division-based bus architecture, which dynamically allocates timeslots on-the-fly - the dTDMA bus. This architecture addresses the contention issues of current bus architectures, while avoiding the multi-hop overhead of NoCs. It is compared to traditional bus architectures and NoCs and shown to outperform both for configurations with fewer than 10 PEs. In order to exploit the advantages of the dTDMA bus for smaller configurations, and the scalability of NoCs, we propose a new hybrid SoC interconnect combining the two, showing significant improvement in both latency and power consumption.
high-performance computer architecture | 2008
Reetuparna Das; Asit K. Mishra; Chrysostomos Nicopoulos; Dongkook Park; Vijaykrishnan Narayanan; Ravishankar R. Iyer; Mazin S. Yousif; Chita R. Das
The trend towards integrating multiple cores on the same die has accentuated the need for larger on-chip caches. Such large caches are constructed as a multitude of smaller cache banks interconnected through a packet-based network-on-chip (NoC) communication fabric. Thus, the NoC plays a critical role in optimizing the performance and power consumption of such non-uniform cache-based multicore architectures. While almost all prior NoC studies have focused on the design of router microarchitectures for achieving this goal, in this paper, we explore the role of data compression on NoC performance and energy behavior. In this context, we examine two different configurations that explore combinations of storage and communication compression: (1) Cache compression (CC) and (2) Compression in the NIC (NC). We also address techniques to hide the decompression latency by overlapping with NoC communication latency. Our simulation results with a diverse set of scientific and commercial benchmark traces reveal that CC can provide up to 33% reduction in network latency and up to 23% power savings. Even in the case of NC - where the data is compressed only when passing through the NoC fabric of the NUCA architecture and stored uncompressed - performance and power savings of up to 32% and 21%, respectively, can be obtained. These performance benefits in the interconnect translate up to 17% reduction in CPI. These benefits are orthogonal to any router architecture and make a strong case for utilizing compression for optimizing the performance and power envelope of NoC architectures. In addition, the study demonstrates the criticality of designing faster routers in shaping the performance behavior.
international conference on computer aided design | 2007
Feng Wang; Chrysostomos Nicopoulos; Xiaoxia Wu; Yuan Xie; Narayanan Vijaykrishnan
As technology scales, the delay uncertainty caused by process variations has become increasingly pronounced in deep sub-micron designs. As a result, a paradigm shift from deterministic to statistical design methodology at all levels of the design hierarchy is inevitable [1]. In this paper, we propose a variation-aware task allocation and scheduling algorithm for Multiprocessor System-on-Chip (MPSoC) architectures to mitigate the impact of parameter variations. A new design metric, called performance yield and defined as the probability of the assigned schedule meeting the predefined performance constraints, is used to guide the task allocation and scheduling procedure. An efficient yield computation method for task scheduling complements and significantly improves the effectiveness of the proposed variation-aware scheduling algorithm. Experimental results show that our variation-aware scheduler achieves significant yield improvements. On average, 45% and 34% yield improvements over worst-case and nominal-case deterministic schedulers, respectively, can be obtained across the benchmarks by using the proposed variation-aware scheduler.
architectures for networking and communications systems | 2005
Jongman Kim; Dongkook Park; Chrysostomos Nicopoulos; Narayanan Vijaykrishnan; Chita R. Das
Network-on-chip (NoC) architectures employing packet-based communication are being increasingly adopted in system-on-chip (SoC) designs. In addition to providing high performance, the fault-tolerance and reliability of these networks is becoming a critical issue due to several artifacts of deep sub-micron technologies. Consequently, it is important for a designer to have access to fast methods for evaluating the performance, reliability, and energy-efficiency of an on-chip network. Towards this end, first, we propose a novel path-sensitive router architecture for low-latency applications. Next, we present a queuing-theory-based model for evaluating the performance and energy behavior of on-chip networks. Then the model is used to demonstrate the effectiveness of our proposed router. The performance (average latency) and energy consumption results from the analytical model are validated with those obtained from a cycle-accurate simulator. Finally, we explore error detection and correction mechanisms that provide different energy-reliability-performance tradeoffs and extend our model to evaluate the on-chip network in the presence of these error protection schemes. Our reliability exploration culminates with the introduction of an array of transient fault protection techniques, both architectural and algorithmic, to tackle reliability issues within the routers individual hardware components. We propose a complete solution safeguarding against both the traditional link faults and internal router upsets, without incurring any significant latency, area and power overhead.
high performance interconnects | 2007
Dongkook Park; Reetuparna Das; Chrysostomos Nicopoulos; Jongman Kim; Narayanan Vijaykrishnan; Ravishankar R. Iyer; Chita R. Das
In modern multi-core system-on-chip (SoC) architectures, the design of innovative interconnection fabrics is indispensable. The concept of the network-on-chip (NoC) architecture has been proposed recently to better suit this requirement. Especially, the router architecture has a significant effect on the overall performance and energy consumption of the chip. We propose a dynamic path management scheme that exploits network traffic information during switch arbitration. Consequently, flits transferred across frequently used paths are expedited by traversing a reduced router pipeline. This technique, based on pipeline bypassing, is simulated and evaluated in terms of network latency and average power consumption. Simulation results with real-world application traces show that the architecture improves the performance up to 30% while incurring only minimal area/power overhead.
IEEE Transactions on Dependable and Secure Computing | 2010
Chrysostomos Nicopoulos; Suresh Srinivasan; Aditya Yanamandra; Dongkook Park; Vijaykrishnan Narayanan; Chita R. Das; Mary Jane Irwin
The advent of diminutive technology feature sizes has led to escalating transistor densities. Burgeoning transistor counts are casting a dark shadow on modern chip design: global interconnect delays are dominating gate delays and affecting overall system performance. Networks-on-Chip (NoC) are viewed as a viable solution to this problem because of their scalability and optimized electrical properties. However, on-chip routers are susceptible to another artifact of deep submicron technology, Process Variation (PV). PV is a consequence of manufacturing imperfections, which may lead to degraded performance and even erroneous behavior. In this work, we present the first comprehensive evaluation of NoC susceptibility to PV effects, and we propose an array of architectural improvements in the form of a new router design-called SturdiSwitch-to increase resiliency to these effects. Through extensive reengineering of critical components, SturdiSwitch provides increased immunity to PV while improving performance and increasing area and power efficiency.
ieee computer society annual symposium on vlsi | 2012
Hyung Gyu Lee; Seungcheol Baek; Jongman Kim; Chrysostomos Nicopoulos
The storage density of PCM has been demonstrated to double through the employment of Multi-Level Cell (MLC) PCM arrays. However, this increase in capacity comes at the expense of increased latency (both read and write) and decreased long-term endurance, as compared to the more conventional Single-Level Cell (SLC) PCM. These negative traits of MLCs detract from the potentially invaluable storage benefits. This paper introduces a compression-based hybrid MLC/SLC PCM management technique that aims to combine the performance edge of SLCs with the higher capacity of MLCs in a hybrid environment. Our trace-driven simulations with real application workloads demonstrate that the proposed technique achieves 3.6X performance enhancement and 72% energy reduction, on average, as compared with MLC-only configurations, while always providing the same effective capacity as the MLC-only mode.