Giorgis Georgakoudis | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Giorgis Georgakoudis is active.

Explore More

Publication

Featured researches published by Giorgis Georgakoudis.

Sensor Systems and Software. Third International ICST Conference, S-Cube 2012, Lisbon, Portugal, June 4-5, 2012, Revised Selected Papers | 2012

Middleware Mechanisms for Agent Mobility in Wireless Sensor and Actuator Networks

Nikos Tziritas; Giorgis Georgakoudis; Spyros Lalis; Tomasz Paczesny; Jaroslaw Domaszewicz; Petros Lampsas; Thanasis Loukopoulos

This paper describes middleware-level support for agent mobility, targeted at hierarchically structured wireless sensor and actuator network applications. Agent mobility enables a dynamic deployment and adaptation of the application on top of the wireless network at runtime, while allowing the middleware to optimize the placement of agents, e.g., to reduce wireless network traffic, transparently to the application programmer. The paper presents the design of the mechanisms and protocols employed to instantiate agents on nodes and to move agents between nodes. It also gives an evaluation of a middleware prototype running on Imote2 nodes that communicate over ZigBee. The results show that our implementation is reasonably efficient and fast enough to support the envisioned functionality on top of a commodity multi-hop wireless technology. Our work is to a large extent platform-neutral, thus it can inform the design of other systems that adopt a hierarchical structuring of mobile components.

Concurrency and Computation: Practice and Experience | 2016

Methods and metrics for fair server assessment under real-time financial workloads

Giorgis Georgakoudis; Charles J. Gillan; Ahmed Sayed; Ivor T. A. Spence; Richard Faloon; Dimitrios S. Nikolopoulos

We present a rigorous methodology and new metrics for fair comparison of server and microserver platforms. Deploying our methodology and metrics, we compare a microserver with ARM cores against two servers with ×86 cores running the same real‐time financial analytics workload. We define workload‐specific but platform‐independent performance metrics for platform comparison, targeting both datacenter operators and end users. Our methodology establishes that a server based on the Xeon Phi co‐processor delivers the highest performance and energy efficiency. However, by scaling out energy‐efficient microservers, we achieve competitive or better energy efficiency than a power‐equivalent server with two Sandy Bridge sockets, despite the microservers slower cores. Using a new iso‐QoS metric, we find that the ARM microserver scales enough to meet market throughput demand, that is, a 100% QoS in terms of timely option pricing, with as little as 55% of the energy consumed by the Sandy Bridge server. Copyright

Proceedings of the First International Workshop on Code OptimiSation for MultI and many Cores | 2013

Fast dynamic binary rewriting to support thread migration in shared-ISA asymmetric multicores

Giorgis Georgakoudis; Dimitrios S. Nikolopoulos; Spyros Lalis

Asymmetric multicore processors have demonstrated a strong potential for improving performance and energy-efficiency. Shared-ISA asymmetric multicore processors overcome programmability problems in disjoint-ISA systems and enhance single-ISA architectures with instruction based asymmetry. In such a design, processors share a common, baseline ISA and performance enhanced (PE) cores extend the baseline ISA with instructions that accelerate performance-critical operations. To exploit asymmetry, the scheduler should be able to migrate threads based on their acceleration potential. The contribution of this paper is a low overhead binary code rewriting method for shared-ISA multicore processors that transforms a binary executable at runtime, according to the scheduled processors PE capabilities. The mutable binary code can be re-targeted among heterogeneous cores at any point in execution while preserving functional equivalence and using PE instructions, transparently, when available, thus enabling migrations among heterogeneous cores. We emulate a realistic shared-ISA asymmetric multicore system using actual hardware -- an FPGA experimental prototype. Experimental analysis shows that dynamic binary rewriting is feasible with little overhead. Rewritten code speeds up successfully baseline code while performing close, with 70% average efficiency, to non-portable, compiler generated code, statically optimized to use PE instructions.

ieee international conference on high performance computing data and analytics | 2017

REFINE: realistic fault injection via compiler-based instrumentation for accuracy, portability and speed

Giorgis Georgakoudis; Ignacio Laguna; Dimitrios S. Nikolopoulos; Martin Schulz

Compiler-based fault injection (FI) has become a popular technique for resilience studies to understand the impact of soft errors in supercomputing systems. Compiler-based FI frameworks inject faults at a high intermediate-representation level. However, they are less accurate than machine code, binary-level FI because they lack access to all dynamic instructions, thus they fail to mimic certain fault manifestations. In this paper, we study the limitations of current practices in compiler-based FI and how they impact the interpretation of results in resilience studies. We propose REFINE, a novel framework that addresses these limitations, performing FI in a compiler backend. Our approach provides the portability and efficiency of compiler-based FI, while keeping accuracy comparable to binary-level FI methods. We demonstrate our approach in 14 HPC programs and show that, due to our unique design, its runtime overhead is significantly smaller than state-of-the-art compiler-based FI frameworks, reducing the time for large FI experiments.

international workshop on openmp | 2015

Application-Level Energy Awareness for OpenMP

Ferdinando Alessi; Peter Thoman; Giorgis Georgakoudis; Thomas Fahringer; Dimitrios S. Nikolopoulos

Power, and consequently energy, has recently attained first-class system resource status, on par with conventional metrics such as CPU time. To reduce energy consumption, many hardware- and OS-level solutions have been investigated. However, application-level information - which can provide the system with valuable insights unattainable otherwise - was only considered in a handful of cases. We introduce OpenMPE, an extension to OpenMP designed for power management. OpenMP is the de-facto standard for programming parallel shared memory systems, but does not yet provide any support for power control. Our extension exposes (i) per-region multi-objective optimization hints and (ii) application-level adaptation parameters, in order to create energy-saving opportunities for the whole system stack. We have implemented OpenMPE support in a compiler and runtime system, and empirically evaluated its performance on two architectures, mobile and desktop. Our results demonstrate the effectiveness of OpenMPE with geometric mean energy savings across 9 use cases of 15 % while maintaining full quality of service.

Parallel Processing Letters | 2015

Iso-Quality of Service: Fairly Ranking Servers for Real-Time Data Analytics

Giorgis Georgakoudis; Charles J. Gillan; Ahmed Sayed; Ivor T. A. Spence; Richard Faloon; Dimitrios S. Nikolopoulos

We present a mathematically rigorous iso-Quality-of-Service (QoS) metric which relates the achievable quality of service (QoS) for a real-time analytics service with workload specific and use case specific performance and output quality requirements to the energy cost of offering the service by different server architectures. Using a new iso-QoS evaluation methodology, we scale server resources to meet QoS targets and directly rank the servers in terms of their energy-efficiency and by extension cost of ownership. Our metric and method are platform-independent and enable fair comparison of datacenter compute servers with significant architectural diversity, including micro-servers. We deploy our metric and methodology to compare three servers running financial option pricing workloads on real-life market data. We find that server ranking is sensitive to data inputs and desired QoS level and that although scale-out micro-servers can be up to two times more energy-efficient than conventional heavyweight ser...

international conference on embedded computer systems architectures modeling and simulation | 2014

Fast Dynamic Binary Rewriting for flexible thread migration on shared-ISA heterogeneous MPSoCs

Giorgis Georgakoudis; Dimitrios S. Nikolopoulos; Hans Vandierendonck; Spyros Lalis

Heterogeneous MPSoCs where different types of cores share a baseline ISA but implement different operational accelerators combine programmability with flexible customization. They hold promise for high performance under power and area limitations. However, transparent binary execution and dynamic scheduling is hard on those platforms. The stateof-the-art approach for transparent accelerated execution is fault-and-migrate (FAM): when a thread executes an accelerating instruction unavailable on the host core, it is forcibly migrated to an accelerating core which implements the instruction natively. Unfortunately, this approach prohibits dynamic scheduling through flexible thread migration, which is essential to any asymmetric platform for efficient utilization of heterogeneous resources. We present two distinct binary-level techniques - Dynamic Binary Rewriting (DBR) and Dynamic Binary Translation (DBT) - which enable selective acceleration, while preserving transparent thread execution and migration, to any core in the system, at any point in time. DBR rewrites binary code to exploit any accelerating instructions available in the host core. DBT implements a-fault-and-rewrite scheme, which sets up trampolines to emulation routines for these accelerating instructions which are not available on the host core. Both methods customize binary code on demand, enabling flexible migration. We evaluate the overhead of DBR and DBT against FAM on a real hardware shared-ISA MPSoC prototype. Experiments with single-thread programs show flexible migration is possible with manageable overhead. We measure the performance of our binary-level techniques by artificially triggering periodic thread migration between a Base and an accelerating (ACC) core. Periodic migration, without aiming for optimized scheduling, results in an average slowdown of about 40% under DBR or about 10% under DBT, compared to FAM driven scheduling. We also show results for a speedup proportional dynamic scheduler, enabled by our techniques, using multi-program workloads. In this case, up to 50% faster execution times can be achieved by leveraging flexible thread migration.

high performance computational finance | 2014

On the viability of microservers for financial analytics

Charles J. Gillan; Dimitrios S. Nikolopoulos; Giorgis Georgakoudis; Richard Faloon; George Tzenakis; Ivor T. A. Spence

Energy consumption and total cost of ownership are daunting challenges for Datacenters, because they scale disproportionately with performance. Datacenters running financial analytics may incur extremely high operational costs in order to meet performance and latency requirements of their hosted applications. Recently, ARM-based microservers have emerged as a viable alternative to high-end servers, promising scalable performance via scale-out approaches and low energy consumption.In this paper, we investigate the viability of ARM-based microservers for option pricing, using the Monte Carlo and Binomial Tree kernels. We compare an ARM-based microserver against a state-of-the-art x86 server. We define application-related but platform-independent energy and performance metrics to compare those platforms fairly in the context of datacenters for financial analytics and give insight on the particular requirements of option pricing. Our experiments show that through scaling out energy-efficient compute nodes within a 2U rack-mounted unit, an ARM-based microserver consumes as little as about 60% of the energy per option pricing compared to an x86 server, despite having significantly slower cores. We also find that the ARM microserver scales enough to meet a high fraction of market throughput demand, while consuming up to 30% less energy than an Intel server.

international conference on embedded computer systems architectures modeling and simulation | 2016

NanoStreams: Codesigned microservers for edge analytics in real time

Giorgis Georgakoudis; Charles J. Gillan; Ahmad Hassan; Umar Ibrahim Minhas; Ivor T. A. Spence; George Tzenakis; Hans Vandierendonck; Roger F. Woods; Dimitrios S. Nikolopoulos; Murali Shyamsundar; Paul Barber; Matthew Russell; Angelos Bilas; Stelios Kaloutsakis; Heiner Giefers; Peter W. J. Staar; Costas Bekas; Neil Horlock; Richard Faloon; Colin Pattison

NanoStreams explores the design, implementation, and system software stack of micro-servers aimed at processing data in-situ and in real time. These micro-servers can serve the emerging Edge computing ecosystem, namely the provisioning of advanced computational, storage, and networking capability near data sources to achieve both low latency event processing and high throughput analytical processing, before considering off-loading some of this processing to high-capacity data centres. Nano Streams explores a scale-out micro-server architecture that can achieve equivalent QoS to that of conventional rack-mounted servers for high-capacity data centres, but with dramatically reduced form factors and power consumption. To this end, Nano Streams introduces novel solutions in programmable & configurable hardware accelerators, as well as the system software stack used to access, share, and program those accelerators. Our Nano Streams micro-server prototype has demonstrated 5.5 x higher energy-efficiency than a standard Xeon Server. Simulations of the micro servers memory system extended to leverage hybrid DDR/NVM main memory indicated 5x higher energy-efficiency than a conventional DDR-based system.

automation, robotics and control systems | 2016

Low-Cost Hardware Infrastructure for Runtime Thread Level Energy Accounting

Marius Marcu; Oana Boncalo; Madalin Ghenea; Alexandru Amaricai; Jan Henrik Weinstock; Rainer Leupers; Zheng Wang; Giorgis Georgakoudis; Dimitrios S. Nikolopoulos; Cosmin Cernazanu-Glavan; Lucian Bara; Marian Ionascu

The ever-growing need for energy efficient computation requires adequate support for energy-aware thread scheduling that offers insight into a systems behavior for improved application energy/performance optimizations. Runtime accurate monitoring of energy consumed by every component of a multi-core embedded system is an important feature to be considered for future designs. Although, important steps have been made in this direction, the problem of distributing energy consumption among threads executed on different cores for shared components remains an ongoing struggle. We aim at designing a generic low-cost and energy efficient hardware infrastructure which supports thread level energy accounting of hardware components in a multi-core system. The proposed infrastructure provides upper software layers with per thread and per component energy accounting API, similar with performance profiling functions. Implementation results indicate that the proposed solution adds around 10i¾ź% resource overhead to the monitored system. Regarding the power estimates, the one derived by our solution achieves a correlation degree of more than 95i¾ź% with the ones obtained from physical power measurements.

Explore More