Alexandros Bartzas
National Technical University of Athens
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Alexandros Bartzas.
International Journal of Reconfigurable Computing | 2008
Kostas Siozios; Alexandros Bartzas; Dimitrios Soudris
In current reconfigurable architectures, the interconnection structures increasingly contribute more to the delay and power consumption. The demand for increased clock frequencies and logic density (smaller area footprint) makes the problem even more important. Three-dimensional (3D) architectures are able to alleviate this problem by accommodating a number of functional layers, each of which might be fabricated in different technology. However, the benefits of such integration technology have not been sufficiently explored yet. In this paper, we propose a software-supported methodology for exploring and evaluating alternative interconnection schemes for 3D FPGAs. In order to support the proposed methodology, three new CAD tools were developed (part of the 3D MEANDER Design Framework). During our exploration, we study the impact of vertical interconnection between functional layers in a number of design parameters. More specifically, the average gains in operation frequency, power consumption, and wirelength are 35%, 32%, and 13%, respectively, compared to existing 2D FPGAs with identical logic resources. Also, we achieve higher utilization ratio for the vertical interconnections compared to existing approaches by 8% for designing 3D FPGAs, leading to cheaper and more reliable devices.
design automation conference | 2013
Iraklis Anagnostopoulos; Vasileios Tsoutsouras; Alexandros Bartzas; Dimitrios Soudris
Todays prevalent solutions for modern embedded systems and general computing employ many processing units connected by an on-chip network leaving behind complex superscalar architectures In this paper, we couple the concept of distributed computing with parallel applications and present a workload-aware distributed run-time framework for malleable applications on many-core platforms. The presented framework is responsible for serving in a distributed way and at run-time, the needs of malleable applications, maximizing resource utilization avoiding dominating effects and taking into account the type of processors supporting platform heterogeneity, while having a small overhead in overall inter-core communication. Our framework has been implemented as part of a C simulator and additionally as a runtime service on the Single-Chip Cloud Computer (SCC), an experimental processor created by Intel Labs, and we compared it against a state-of-art run-time resource manager. Experimental results showed that our framework has on average 70% less messages, 64% smaller message size and 20% application speed-up gain.
design, automation, and test in europe | 2012
Iraklis Anagnostopoulos; Alexandros Bartzas; Georgios Kathareios; Dimitrios Soudris
Real-time applications are raising the challenge of unpredictability. This is an extremely difficult problem in the context of modern, dynamic, multiprocessor platforms which, while providing potentially high performance, make the task of timing prediction extremely difficult. In this paper, we present a flexible distributed run-time application mapping framework for both homogeneous and heterogeneous multi-core platforms that adapts to applications needs and applications execution restrictions. The novel idea of this article is the application of autonomic management paradigms in a decentralized manner inspired by Divide-and-Conquer (D&C) method. We have tested our approach in a Leon-based Network-on-Chip platform using both synthetic and real application workload. Experimental results showed that our mapping framework produces on average 21% and 10% better on-chip communication cost for homogeneous and heterogeneous platform respectively.
Journal of Systems and Software | 2010
Alexandros Bartzas; Miguel Peón-Quirós; Christophe Poucet; Christos Baloukas; Francky Catthoor; Dimitrios Soudris; José M. Mendías
Development of new embedded systems requires tuning of the software applications to specific hardware blocks and platforms as well as to the relevant input data instances. The behaviour of these applications heavily relies on the nature of the input data samples, thus making them strongly data-dependent. For this reason, it is necessary to extensively profile them with representative samples of the actual input data. An important aspect of this profiling is done at the dynamic data type level, which actually steers the designers choice of implementation of these data types. The behaviour of the applications is then characterized, through an analysis phase, as a collection of software metadata that can be used to optimize the system as a whole. In this paper we propose to represent the behaviour of data-dependent applications to enable optimizations, rather than to analyze their structure or to define the engineering process behind them. Moreover, we specifically limit ourselves to the scope of applications dominated by dynamically allocated data types running on embedded systems. We characterize the software metadata that these optimizations require, and we present a methodology, as well as appropriate techniques, to obtain this information from the original application. The optimizations performed on a complete case study, utilizing the extracted software metadata, achieve overall improvements of up to 42% in the number of cycles spent accessing memory when compared to code optimized only with the static techniques applied by GNU G++.
design, automation, and test in europe | 2006
Alexandros Bartzas; Georgios Pouiklis; David Atienza; Francky Catthoor; Dimitrios Soudris; A. Thanailakis
Network applications are becoming increasingly popular in the embedded systems domain requiring high performance, which leads to high energy consumption. In networks is observed that due to their inherent dynamic nature the dynamic memory subsystem is a main contributor to the overall energy consumption and performance. This paper presents a new systematic methodology, generating performance-energy trade-offs by implementing dynamic data types (DDTs), targeting network applications. The proposed methodology consists of: (i) the application-level DDT exploration; (ii) the network-level DDT exploration; and (iii) the Pareto-level DDT exploration. The methodology, supported by an automated tool, offers the designer a set of optimal dynamic data type design solutions. The effectiveness of the proposed methodology is tested on four representative real-life case studies. By applying the second step, it is proved that energy savings up to 80% and performance improvement up to 22% (compared to the original implementations of the benchmarks) can be achieved. Additional energy and performance gains can be achieved and a wide range of possible trade-offs among our Pareto-optimal design choices are obtained, by applying the third step. We achieved up to 93% reduction in energy consumption and up to 48% increase in performance
international conference on embedded computer systems: architectures, modeling, and simulation | 2010
Sotirios Xydis; Alexandros Bartzas; Iraklis Anagnostopoulos; Dimitrios Soudris; Kiamal Z. Pekmestzi
We address the problem of custom Dynamic Memory Management (DMM) in Multi-Processor System-on-Chip (MPSoC) architectures. Customization is enabled through the definition of a design space that captures in a global, modular and parameterized manner the primitive building blocks of multi-threaded DMM. A systematic exploration methodology is proposed to efficiently traverse the design space. Customized Pareto DMM configurations are automatically generated through the development of software tools implementing the proposed methodology. Experimental evaluation based on a real-life multithreaded dynamic network application show that the proposed methodology delivers higher quality (application-specific) solutions in comparison with state-of-the-art dynamic memory managers together with 62% exploration runtime reductions.
ieee computer society annual symposium on vlsi | 2010
Cristina Silvano; William Fornaciari; S. Crespi Reghizzi; Giovanni Agosta; Gianluca Palermo; Vittorio Zaccaria; Patrick Bellasi; Fabrizio Castro; Simone Corbetta; A. Di Biagio; E. Speziale; Michele Tartara; David Siorpaes; Heiko Hübert; Benno Stabernack; Jens Brandenburg; Martin Palkovic; Praveen Raghavan; Chantal Ykman-Couvreur; Alexandros Bartzas; Sotirios Xydis; Dimitrios Soudris; Torsten Kempf; Gerd Ascheid; Rainer Leupers; Heinrich Meyr; J. Ansari; P. Mähönen; Bart Vanthournout
The main goals of the 2PARMA project are: the definition of a parallel programming model combining component-based and single-instruction multiple-thread approaches, instruction set virtualisation based on portable byte-code, run-time resource management policies and mechanisms as well as design space exploration methodologies for many-core computing architectures.
IEEE Embedded Systems Letters | 2011
Iraklis Anagnostopoulos; Sotirios Xydis; Alexandros Bartzas; Zhonghai Lu; Dimitrios Soudris; Axel Jantsch
Multiprocessor system-on-chip (MPSoCs) have attracted significant attention since they are recognized as a scalable paradigm to interconnect and organize a high number of cores. Current multicore embedded systems exhibit increased levels of dynamic behavior, leading to unexpected memory footprint variations unknown at design time. Dynamic memory management (DMM) is a promising solution for such types of dynamic systems. Although some efficient dynamic memory managers have been proposed for conventional bus-based MPSoC platforms, there are no DMM solutions regarding the constraints and the opportunities delivered by the physical distribution of multiple memory nodes of the platform. In this work, we address the problem of providing customized microcoded DMM on MPSoC platforms with distributed memory organization. Customization is enabled at application- and platform-level. Results show that customized microcoded DMM can serve approximately 7× more allocation requests compared to pure distributed memory platforms and perform 25% faster than the corresponding high-level implementation in C language.
international conference on digital signal processing | 2009
Iraklis Anagnostopoulos; Alexandros Bartzas; Ioannis Vourkas; Dimitrios Soudris
Emerging DSP applications have different latency, energy consumption and Quality of Service (QoS) requirements. An implementation of such applications requires a large number of intellectual property (IP) cores, communicating with each other, meeting the energy and latency constraints. Network-on-Chip (NoC) architectures is able to accommodate a large number of IP cores in the same chip implementing a set of complex applications. This leads to different usage of the available buffer space in the routers of the NoC system. In this work we propose power and the systematic design of novel NOC-based architectures, which realize DSP applications. Additionally, we present an integrated node resource management technique that combines priority assignment and buffer sizing so that the NoC system to best serve requirements of the considered Finally, to best of our knowledge, the implementation of DSP applications in 3D NOC architectures took place for first time. DSP applications. The proposed approach has been evaluated both on 2D and 3D mesh topologies by employing an NoC simulator and four real DSP/multimedia applications gaining an average of 34% on energy×delay product for each application. Finally, to best of our knowledge, the implementation of DSP applications in 3D NOC architectures took place for first time.
design automation conference | 2010
Yiannis Iosifidis; Arindam Mallik; Eddy De Greef; Alexandros Bartzas; Dimitrios Soudris; Francky Catthoor
The key characteristic of next generation embedded applications will be the intensive data transfer and storage and the need for efficient memory management. The embedded system designer community needs optimization methodologies and techniques, which do not change the input-output functionality of the software applications or the design of the underlying hardware platform. In this paper, the key focus is the efficient data access and memory storage of both dynamically and statically allocated data and their assignment on the data memory hierarchy of an MPSoC platform. We propose a design tool framework to efficiently automate the time-consuming optimizations for parallelization and memory mapping of static and dynamic data for MPSoCs.