Efstathios Sotiriou-Xanthopoulos

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Efstathios Sotiriou-Xanthopoulos is active.

Explore More

Publication

Featured researches published by Efstathios Sotiriou-Xanthopoulos.

high performance embedded architectures and compilers | 2014

Effective Platform-Level Exploration for Heterogeneous Multicores Exploiting Simulation-Induced Slacks

Efstathios Sotiriou-Xanthopoulos; Sotirios Xydis; Kostas Siozios; George Economakos; Dimitrios Soudris

Heterogeneous Multi-Processor Systems-on-Chip (MPSoC) exhibit increased design complexity due to numerous architectural parameters and hardware/software partitioning schemes. Automated Design Space Exploration (DSE) becomes an essential design procedure to discover optimized solutions in a reasonable time. For high-quality DSE, the accurate solution evaluation is a strong requirement. To this direction, High-Level Synthesis (HLS) can be used for the characterization of the design solutions. In this paper, we propose (a) a platform design methodology that exploits simulation-induced slacks generated by avoiding simulation re-initializations and exploits the gained time for HLS, and (b) a DSE tool-flow which takes into account multiple HW/SW partitioning schemes and intelligently schedules system evaluations. Experimental results show that the proposed methodology achieves 17% simulation improvements together with 77% higher accuracy, in comparison to a typical exploration approach.

international conference on embedded computer systems architectures modeling and simulation | 2014

Co-design of many-accelerator heterogeneous systems exploiting virtual platforms

Efstathios Sotiriou-Xanthopoulos; Sotirios Xydis; Kostas Siozios; George Economakos

Modern multiprocessor heterogeneous systems incorporating multiple hardware accelerators on chip have resulted in an excessive increase in the complexity of hardware/software co-design. Designers have now to explore a design space including both per-accelerator architectural parameters as well as inter-accelerator combinations, i.e. different design configurations among the allocated accelerators, as each accelerator instance has different computational requirements, according to different input data, while throughput and area constraints should be met as well. Under such a system scenario, virtual platform prototyping suffers from increased design time phases, since it requires an exponentially larger number of evaluations to succeed adequate coverage of the design space. In this paper, we propose a co-design framework on top of virtual prototyping solution, customized for many-accelerator heterogeneous systems. The proposed framework defines separate configurations for each accelerator component of the virtual platform, instead of using only one common configuration, thus succeeding to meet both the area and the throughput constraints. In addition, as the design space size increases exponentially, the proposed framework utilizes process-based reconfigurable SystemC modules to intelligently bypass the non-productive simulation stages, thus delivering faster hardware/software co-design cycles. A case study emulating an heterogeneous server system for simultaneous video decoding of multiple streams shows the efficiency of the proposed approach, delivering design solutions with up to 1.58× improved area or 1.59× improved throughput, while achieving simulation time gains of 40%.

international conference on embedded computer systems architectures modeling and simulation | 2013

A Process-based Reconfigurable SystemC Module for simulation speedup

Efstathios Sotiriou-Xanthopoulos; Kostas Siozios; George Economakos; Dimitrios Soudris

As Multi-Processor Systems-on-Chip (MPSoC) architectures become more and more complex, Design Space Exploration (DSE) becomes the only viable solution for finding the pareto-optimal designs. To evaluate each solution with real dataset, DSE has to simulate the design under test, which is modeled as a Virtual Platform usually written in SystemC. However, the simulation is a very slow task which includes non-productive time periods like system initialization, while the platform re-compilation also imposes a significant overhead. In this paper, a Process-based Reconfigurable Module is used in order to bypass the non-productive simulation parts, thus accelerating the simulation. The effectiveness of the proposed methodology is proved with a series of computationally intensive multimedia applications, where the simulation time improvements reach 34% on average.

international conference on embedded computer systems architectures modeling and simulation | 2015

A power estimation technique for cycle-accurate higher-abstraction SystemC-based CPU models

Efstathios Sotiriou-Xanthopoulos; G. Shalina Percy Delicia; Peter Figuli; Kostas Siozios; George Economakos; Jürgen Becker

Due to the ever-increasing complexity of embedded system design and the need for rapid system evaluations in early design stages, the use of simulation models known as Virtual Platforms (VPs) has been of utmost importance as they enable system modeling at higher abstraction levels. Since a typical VP features multiple interdependent components, VP libraries have been utilized in order to provide off-the-shelf models of commonly-used hardware components, such as CPUs. However, CPU power estimation is not adequately supported by existing VP libraries. In addition, existing power characterization techniques require architectural details which are not always available in early design stages. To address this issue, this paper proposes a technique for power annotation of CPU models targeting SystemC/TLM libraries in order to enable the accurate power estimation at higher abstraction levels. By using a set of benchmarks on a power-annotated SystemC/TLM model of Xilinx Microblaze soft-processor, it is shown that the proposed approach can achieve accurate power estimation in comparison to the real-system power measurements as the estimation error ranges from 0.47% up to 6.11% with an average of 2%.

Microprocessors and Microsystems | 2014

A framework for rapid evaluation of heterogeneous 3-D NoC architectures

Efstathios Sotiriou-Xanthopoulos; Dionysios Diamantopoulos; Kostas Siozios; George Economakos; Dimitrios Soudris

Abstract The scalability of communication infrastructure in modern Integrated Circuits (ICs) becomes a challenging issue, which might be a significant bottleneck if not carefully addressed. Towards this direction, the usage of Networks-on-Chip (NoC) is a preferred solution. In this work, we propose a software-supported framework for quantifying the efficiency of heterogeneous 3-D NoC architectures. In contrast to existing approaches for NoC design, the introduced heterogeneous architecture consists of a mixture of 2-D and 3-D routers, which reduces the delay and power consumption with a slight impact on packet hops. More specifically, the experimental results with a number of DSP applications show the effectiveness of the introduced methodology, as we achieve on average 25% higher maximum operation frequency and 39% lower power consumption compared to the uniform 3-D NoCs.

ACM Transactions in Embedded Computing Systems | 2014

Plug&Chip: A Framework for Supporting Rapid Prototyping of 3D Hybrid Virtual SoCs

Dionysios Diamantopoulos; Efstathios Sotiriou-Xanthopoulos; Kostas Siozios; George Economakos; Dimitrios Soudris

In the embedded system domain there is a continuous demand towards providing higher flexibility for application development. This trend strives for virtual prototyping solutions capable of performing fast system simulation. Among other benefits, such a solution supports concurrent hardware/software system design by enabling to start developing, testing, and validating the embedded software substantially earlier than has been possible in the past. Towards this direction, throughout this article we introduce a new framework, named Plug&Chip, targeting to support rapid prototyping of 2D and 3D digital systems. In contrast to other relevant approaches, our solution provides higher flexibility by enabling incremental system design, while also handling platforms developed with the usage of 3D integration technology.

international conference on wireless mobile communication and healthcare | 2014

Hardware accelerated rician denoise algorithm for high performance magnetic resonance imaging

Efstathios Sotiriou-Xanthopoulos; Sotirios Xydis; Kostas Siozios; George Economakos; Dimitrios Soudris

Rician denoising is a mandatory task of Magnetic Resonance Imaging (MRI), as it enables higher-quality image processing, which is crucial for correct diagnosis. However, denoising is a slow task, especially because of the increased image resolution and the need for high image clarity. A solution towards this need is the implementation of rician denoise algorithm onto hardware. In this paper, we propose a hardware implementation of rician denoise, which processes the MR image into segments in a pipelined manner, while avoiding further processing on already denoised pixels of the image. Using a synthetic MRI scan separated into 16 segments, the proposed implementation achieves a speedup of 6.8× with comparable image quality, as compared to a software-only approach running on Intel Core2Duo.

reconfigurable computing and fpgas | 2011

Low-Power Reconfigurable Component Utilization in a High-Level Synthesis Flow

Dimitris Bekiaris; George Economakos; Efstathios Sotiriou-Xanthopoulos; Dimitrios Soudris

Reconfigurable computing is a cost-effective alternative to technology shrinking in order to achieve higher performance in digital design, especially considering run time reconfiguration. Research in the field consists of new reconfigurable architectures, either coarse-grain or fine-grain, and new methodologies to map applications onto them. A special case of coarse-grain reconfigurable components are morphable multipliers, which use multiplexers to feed different inputs and form different connection schemes within the data path of conventional multipliers. These connection schemes form different components that can be utilized when the initial multiplier is idle. Morphable components offer performance improvements but the use of extra multiplexers impose power overheads. This paper applies two low-power design techniques, power gating and multi Vth components, for the design of low-power morphable multipliers. Experimentation with these multipliers in a high-level synthesis flow show that they can offer performance, area and power improvements compared to alternative architectures, making them valuable building blocks for hardware synthesis.

ACM Transactions in Embedded Computing Systems | 2016

An Integrated Exploration and Virtual Platform Framework for Many-Accelerator Heterogeneous Systems

Efstathios Sotiriou-Xanthopoulos; Sotirios Xydis; Kostas Siozios; George Economakos; Dimitrios Soudris

The recent advent of many-accelerator systems-on-chip (SoC), driven by the need for maximizing throughput and power efficiency, has led to an exponential increase in the hardware/software co-design complexity. The reason of this increase is that the designer has to explore a vast number of architectural parameter combinations for each single accelerator, as well as inter-accelerator configuration combinations under specific area, throughput, and power constraints, given that each accelerator has different computational requirements. In such a case, the design space size explodes. Thus, existing design space exploration (DSE) techniques give poor-quality solutions, as the design space cannot be adequately covered in a fair time. This problem is aggravated by the very long simulation time of the many-accelerator virtual platforms (VPs). This article addresses these design issues by (a) presenting a virtual prototyping solution that decreases the exploration time by enabling the evaluation of multiple configurations per VP simulation and (b) proposing a DSE methodology that efficiently explores the design space of many-accelerator systems. With the use of two fully developed use cases, namely an H.264 decoding server for multiple video streams and a parallelized denoising system for MRI scans, we show that the proposed DSE methodology either leads to Pareto points that dominate over those of a typical DSE scenario or finds new solutions that might not be found by the typical DSE. In addition, the proposed virtual prototyping solution leads to DSE runtime reduction reaching 10 × for H.264 and 5 × for Rician denoise.

international conference on embedded computer systems architectures modeling and simulation | 2015

A virtual platform for exploring hierarchical interconnection for many-accelerator systems

Efstathios Sotiriou-Xanthopoulos; Sotirios Xydis; Kostas Siozios; George Economakos

The advent of many-accelerator Systems-on-Chip (SoC), as a result of the ever increasing demands for high performance and energy efficiency, has lead to the need for new interconnection schemes among the system components, which minimize the communication overhead. Towards this need, Hierarchical Networks-on-Chip (HNoCs) can provide an efficient communication paradigm for such systems: Each node is an autonomous sub-network including the hardware accelerators needed by the respective application thread, thus retaining data locality and minimizing congestion. However, HNoC design may lead to exponential increase in the design space size, due to the numerous parameter combinations of the sub-networks and the overall HNoC. In addition, the need for a prototyping framework supporting HNoC simulation with real stimuli is crucial for the accurate system evaluation. Therefore, the goal of this paper is to present (a) a SystemC framework for cycle-accurate simulation of Hierarchical NoCs, accompanied with a NoC API for node mapping on the HNoC; and (b) an exploration flow that targets to reduce the increased design space size. By using the Rician Denoising algorithm for MRI scans as a case study, the proposed DSE flow could achieve up to 2× and 1.48× time and power improvements respectively, as compared to a typical DSE flow.

Explore More