Vasileios Spiliopoulos

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Vasileios Spiliopoulos is active.

Explore More

Publication

Featured researches published by Vasileios Spiliopoulos.

2011 International Green Computing Conference and Workshops | 2011

Green governors: A framework for Continuously Adaptive DVFS

Vasileios Spiliopoulos; Stefanos Kaxiras; Georgios Keramidas

We present Continuously Adaptive Dynamic Voltage/Frequency scaling in Linux systems running on Intel i7 and AMD Phenom II processors. By exploiting slack, inherent in memory-bound programs, our approach aims to improve power efficiency even when the processor does not sit idle. Our underlying methodology is based on a simple first-order processor performance model in which frequency scaling is expressed as a change (in cycles) of the main memory latency. Utilizing available monitoring hardware we show that our model is powerful enough to i) predict with reasonable accuracy the effect of frequency scaling (in terms of performance loss) and ii) predict the core energy under different V/f combinations. To validate our approach we perform highly accurate, fine-grained power measurements directly on the off-chip voltage regulators. We use our model to implement various DVFS policies as Linux “green” governors to continuously optimize for various power-efficiency metrics such as EDP or ED2P, or achieve energy savings with a user-specified limit on performance loss. Our evaluation shows that, for SPEC2006 workloads, our governors achieve dynamically the same optimal EDP or ED2P (within 2% on avg.) as an exhaustive search of all possible frequencies. Energy savings can reach up to 56% in memory-bound workloads with corresponding improvements of about 55% for EDP or ED2P.

modeling, analysis, and simulation on computer and telecommunication systems | 2012

Power-Sleuth: A Tool for Investigating Your Program's Power Behavior

Vasileios Spiliopoulos; Andreas Sembrant; Stefanos Kaxiras

Modern processors support aggressive power saving techniques to reduce energy consumption. However, traditional profiling techniques have mainly focused on performance, which does not accurately reflect the power behavior of applications. For example, the longest running function is not always the most energy-hungry function. Thus software developers cannot always take full advantage of these power-saving features. We present Power-Sleuth, a power/performance estimation tool which is able to provide a full description of an applications behavior for any frequency from a single profiling run. The tool combines three techniques: a power and a performance estimation model with a program phase detection technique to deliver accurate, per-phase, per-frequency analysis. Our evaluation (against real power measurements) shows that we can accurately predict power and performance across different frequencies with average errors of 3.5% and 3.9% respectively.

symposium on code generation and optimization | 2014

Fix the code. Don't tweak the hardware: A new compiler approach to Voltage-Frequency scaling

Alexandra Jimborean; Konstantinos Koukos; Vasileios Spiliopoulos; David Black-Schaffer; Stefanos Kaxiras

Traditional compiler approaches to optimize power efficiency aim to adjust voltage and frequency at runtime to match the code characteristics to the hardware (e.g., running memory-bound phases at a lower frequency). However, such approaches are constrained by three factors: (i) voltage-frequency transitions are too slow to be applied at instruction granularity, (ii) larger code regions are seldom unequivocally memory- or compute-bound, and, (iii) the available voltage scaling range for future technologies is rapidly shrinking. These factors necessitate new approaches to address power-efficiency at the code-generation level. This paper proposes one such approach to automatically generate power-efficient code using a decoupled access/execute (DAE) model. In DAE a program is split into tasks, where each task consists of two sufficiently coarse-grained phases to enable effective Dynamic Voltage Frequency Scaling (DVFS): (i) the access-phase for data prefetch (heavily memory-bound), and (ii) the execute-phase that performs the actual computation (heavily compute-bound). Our contribution is to provide a compiler methodology to automatically generate the access-phases for a task-based programming system. Our approach is capable of handling both affine (through a polyhedral analysis) and non-affine codes (through optimized task skeletons). Our evaluation shows that the automatically generated versions improve EDP by 25% on average compared to a coupled execution, without any performance degradation, and surpasses the EDP savings of the corresponding hand-crafted tasks by 5%.

modeling, analysis, and simulation on computer and telecommunication systems | 2013

Introducing DVFS-Management in a Full-System Simulator

Vasileios Spiliopoulos; Akash Bagdia; Andreas Hansson; Peter Aldworth; Stefanos Kaxiras

Dynamic Voltage and Frequency Scaling (DVFS) is an essential part of controlling the power consumption of any computer system, ranging from mobile phones to servers. DVFS efficiency relies on hardware-software co-optimization, thus using existing hardware cannot reveal the full optimization potential beyond the current implementations characteristics. To explore the vast design space for DVFS efficiency, that straddles software and hardware, a simulation infrastructure must provide features that are not readily available today, for example: software controllable clock and voltage domains, support for the OS and the frequency scaling module of it, and an online power estimation methodology. As the main contribution, this work enables DVFS studies in a full-system simulator. We extend the gem5 simulator to support full-system DVFS modeling. By doing so, we enable energy-efficiency experiments to be performed in gem5 and we showcase such studies. Finally, we show that both existing and novel frequency governors for Linux and Android can be effortlessly integrated in the framework, and we evaluate the efficiency of different DVFS schemes.

international conference on supercomputing | 2013

Towards more efficient execution: a decoupled access-execute approach

Konstantinos Koukos; David Black-Schaffer; Vasileios Spiliopoulos; Stefanos Kaxiras

The end of Dennard scaling is expected to shrink the range of DVFS in future nodes, limiting the energy savings of this technique. This paper evaluates how much we can increase the effectiveness of DVFS by using a software decoupled access-execute approach. Decoupling the data access from execution allows us to apply optimal voltage-frequency selection for each phase and therefore improve energy efficiency over standard coupled execution. The underlying insight of our work is that by decoupling access and execute we can take advantage of the memory-bound nature of the access phase and the compute-bound nature of the execute phase to optimize power efficiency, while maintaining good performance. To demonstrate this we built a task based parallel execution infrastructure consisting of: (1) a runtime system to orchestrate the execution, (2) power models to predict optimal voltage-frequency selection at runtime, (3) a modeling infrastructure based on hardware measurements to simulate zero-latency, per-core DVFS, and (4) a hardware measurement infrastructure to verify our models accuracy. Based on real hardware measurements we project that the combination of decoupled access-execute and DVFS has the potential to improve EDP by 25% without hurting performance. On memory-bound applications we significantly improve performance due to increased MLP in the access phase and ILP in the execute phase. Furthermore we demonstrate that our method can achieve high performance both in presence or absence of a hardware prefetcher.

compiler construction | 2016

Multiversioned decoupled access-execute: the key to energy-efficient compilation of general-purpose programs

Konstantinos Koukos; Per Ekemark; Georgios Zacharopoulos; Vasileios Spiliopoulos; Stefanos Kaxiras; Alexandra Jimborean

Computer architecture design faces an era of great challenges in an attempt to simultaneously improve performance and energy efficiency. Previous hardware techniques for energy management become severely limited, and thus, compilers play an essential role in matching the software to the more restricted hardware capabilities. One promising approach is software decoupled access-execute (DAE), in which the compiler transforms the code into coarse-grain phases that are well-matched to the Dynamic Voltage and Frequency Scaling (DVFS) capabilities of the hardware. While this method is proved efficient for statically analyzable codes, general-purpose applications pose significant challenges due to pointer aliasing, complex control flow and unknown runtime events. We propose a universal compile-time method to decouple general-purpose applications, using simple but efficient heuristics. Our solutions overcome the challenges of complex code and show that automatic decoupled execution significantly reduces the energy expenditure of irregular or memory-bound applications and even yields slight performance boosts. Overall, our technique achieves over 20% on average energy-delay-product (EDP) improvements (energy over 15% and performance over 5%) across 14 benchmarks from SPEC CPU 2006 and Parboil benchmark suites, with peak EDP improvements surpassing 70%.

international conference on industrial informatics | 2013

Embedded reconfigurable computing: the ERA approach

Georgios Keramidas; Stephan Wong; Fakhar Anjam; Anthony Brandon; Roel Seedorf; Claudio Scordino; Luigi Carro; Debora Matos; Roberto Giorgi; Stamatis Kavvadias; Sally A. McKee; Bhavishya Goel; Vasileios Spiliopoulos

The growing complexity and diversity of embedded systems-combined with continuing demands for higher performance and lower power consumption-places increasing pressure on embedded platforms designers. The target of the ERA project is to offer a holistic, multi-dimensional methodology to address these problems in a unified framework exploiting the inter- and intra-synergism between the reconfigurable hardware (core, memory, and network resources), the reconfigurable software (compiler and tools), and the run-time system. Starting from the hardware level, we design our platform via a structured approach that allows integration of reconfigurable computing elements, network fabrics, and memory hierarchy components. These hardware elements can adapt their composition, organization, and even instruction-set architectures to exploit tradeoffs in performance and power. Appropriate hardware resources can be selected both statically at design time and dynamically at run time. Hardware details are exposed to our custom operating system, our custom runtime system, and our adaptive compiler, and are even visible all the way up to the application level. The design philosophy followed in the ERA project proved efficient enough not only to enable a better choice of power/performance trade-offs but also to support fast platform prototyping of high-efficiency embedded system designs. In this paper, we present a brief overview of the design approach, the major outcomes, and the lessons learned in the ERA project.

embedded systems for real time multimedia | 2016

An Online Overclocking Scheme for Bursty Real-time Tasks and an Evaluation of its Thermal Impact

Bjoern Forsberg; Kai Lampka; Vasileios Spiliopoulos

This paper proposes a scheme which drives a processor beyond its rated operation frequency, e. g., by exploiting Intels boost technology, to digest the peak workload of the system in time. In the setting of deadline constrained workloads, this is far from trivial: the boost mode can only be used during short time spans, therefore it can only help to digest the peak workload, rather than serving the normal case. A lowered processor frequency, used outside the peak workload time, yields a backlog of not completed jobs. This backlog may result in deadline violations or buffer overflows, if the next burst of job arrivals appears too early. To overcome the above problem, we propose a peak workload aware speed assignment strategy, which only allows the system to build up computation backlog if the absence of high computation demands is assured. Contrasting the existing body of work, we take advantage of bursty arrival patterns of compute jobs, thereby progressing over the standard (non-bursty sporadic) job release model. Together with our scheme, we also present a tool chain and simulations of synthetic workloads for investigating the thermal effects of different speed assignment strategies.

Journal of Parallel and Distributed Computing | 2016

Keep it cool and in time

Kai Lampka; Björn Forsberg; Vasileios Spiliopoulos

The Dynamic Power and Thermal Management (DPTM) system of Dynamic Voltage Frequency Scaling (DVFS) enabled processors compensates peak temperatures by slowing or even powering parts of the system down. While ensuring the integrity of computations, this comes with the drawback of losing performance.In the context of hard real-time systems, such unpredictable losses in performance are unacceptable, as they may lead to deadline misses which may yet compromise the integrity of the system. To safely execute hard real-time workloads on such systems, this article presents an online scheme for assigning speeds in such a way that (a) the system executes at low clock speed as often as possible, while (b) deadline violations are strictly ruled out.The proposed scheme is compared with an offline scheme which has complete knowledge about arrival times and execution demands of the workload. The benchmarking shows that for a workload which is always very close to the modelled maximum, our approach performs on-par with the offline scheme. In case of a workload which diverges from the modelled maximum more often, the speed assignments produced by our scheme become more pessimistic, as to ensure that all deadlines are met. Dynamic power management for reducing heat emission in multicores.Run-time monitoring for tracking real-time workloads with respect to a bound.Introduces the concept of a worst-case ready queue for computing speed assignments.Speed assignment computation scheme that guarantee timing correctness of workloads.

international conference on supercomputing | 2011

Poster: DVFS management in real-processors

Vasileios Spiliopoulos; Georgios Keramidas; Stefanos Kaxiras; Konstantinos Efstathiou

We describe a framework for run-time adaptive dynamic voltage-frequency scaling in Linux systems. Our underlying methodology is based on a simple first-order processor performance model in which frequency scaling is expressed as a change (in cycles) of the main memory latency. Utilizing available performance monitoring hardware, we show that our model is powerful enough to i) predict with reasonable accuracy the effect of frequency scaling, and ii) predict the energy consumed by the core under different V/f combinations. To validate our approach we perform highly accurate, fine grained power measurements directly on the processor off-chip voltage regulator.

Explore More