Georgios Keramidas | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Georgios Keramidas is active.

Explore More

Publication

Featured researches published by Georgios Keramidas.

international conference on computer design | 2007

Cache replacement based on reuse-distance prediction

Georgios Keramidas; Pavlos Petoumenos; Stefanos Kaxiras

Several cache management techniques have been proposed that indirectly try to base their decisions on cacheline reuse-distance, like Cache Decay which is a postdiction of reuse-distances: if a cacheline has not been accessed for some ldquodecay intervalrdquo we know that its reuse-distance is at least as large as this decay interval. In this work, we propose to directly predict reuse-distances via instruction-based (PC) prediction and use this information for cache level optimizations. In this paper, we choose as our target for optimization the replacement policy of the L2 cache, because the gap between the LRU and the theoretical optimal replacement algorithm is comparatively large for L2 caches. This indicates that, in many situations, there is ample room for improvement. We evaluate our reusedistance based replacement policy using a subset of the most memory intensive SPEC2000 and our results show significant benefits across the board.

2011 International Green Computing Conference and Workshops | 2011

Green governors: A framework for Continuously Adaptive DVFS

Vasileios Spiliopoulos; Stefanos Kaxiras; Georgios Keramidas

We present Continuously Adaptive Dynamic Voltage/Frequency scaling in Linux systems running on Intel i7 and AMD Phenom II processors. By exploiting slack, inherent in memory-bound programs, our approach aims to improve power efficiency even when the processor does not sit idle. Our underlying methodology is based on a simple first-order processor performance model in which frequency scaling is expressed as a change (in cycles) of the main memory latency. Utilizing available monitoring hardware we show that our model is powerful enough to i) predict with reasonable accuracy the effect of frequency scaling (in terms of performance loss) and ii) predict the core energy under different V/f combinations. To validate our approach we perform highly accurate, fine-grained power measurements directly on the off-chip voltage regulators. We use our model to implement various DVFS policies as Linux “green” governors to continuously optimize for various power-efficiency metrics such as EDP or ED2P, or achieve energy savings with a user-specified limit on performance loss. Our evaluation shows that, for SPEC2006 workloads, our governors achieve dynamically the same optimal EDP or ED2P (within 2% on avg.) as an exhaustive search of all possible frequencies. Energy savings can reach up to 56% in memory-bound workloads with corresponding improvements of about 55% for EDP or ED2P.

international symposium on microarchitecture | 2010

SARC Coherence: Scaling Directory Cache Coherence in Performance and Power

Stefanos Kaxiras; Georgios Keramidas

The SARC project seeks to improve power scalability of shared-memory chip multiprocessors (CMPs) by making directory coherence more efficient in both power and performance. The authors describe how they eliminate two major sources of inefficiency for directory coherence protocols: invalidation traffic on writes and directory indirection for finding the writer.

international conference on computer communications | 2005

IPStash: a set-associative memory approach for efficient IP-lookup

Stefanos Kaxiras; Georgios Keramidas

IP-lookup is a challenging problem because of the increasing routing table sizes, increased traffic and higher speed links. These characteristics lead to the prevalence of hardware solutions such as TCAMs (ternary content addressable memories), despite their high power consumption, low update rate and increased board area requirements. We propose a memory architecture called IPStash to act as a TCAM replacement, offering at the same time, high update rate, higher performance and significant power savings. The premise of our work is that full associativity is not necessary for IP-lookup. Rather, we show that the required associativity is simply a function of the routing table size, Thus, we propose a memory architecture similar to set-associative caches but enhanced with mechanisms to facilitate IP-lookup and in particular longest prefix match (LPM). To reach a minimum level of required associativity we introduce an iterative method to perform LPM in a small number of iterations. This allows us to insert route prefixes of different lengths in IPStash very efficiently, selecting the most appropriate index in each case. Orthogonal to this, we use skewed associativity to increase the effective capacity of our devices. We thoroughly examine different choices in partitioning routing tables for the iterative LPM and the design space for the IPStash devices. The proposed architecture is also easily expandable. Using the Cacti 3.2 access time and power consumption simulation tool we explore the design space for IPStash devices and we compare them with the best blocked commercial TCAMs.

international symposium on microarchitecture | 2003

IPStash: a power-efficient memory architecture for IP-lookup

Stefanos Kaxiras; Georgios Keramidas

High-speed routers often use commodity, fully-associative, TCAMs (ternary content addressable memories) to perform packet classification and routing (IP-lookup). We propose a memory architecture called IPStash to act as a TCAM replacement, offering at the same time, better functionality, higher performance, and significant power savings. The premise of our work is that full associativity is not necessary for IP-lookup. Rather, we show that the required associativity is simply a function of the routing table size. We propose a memory architecture similar to set-associative caches but enhanced with mechanisms to facilitate IP-lookup and in particular longest prefix match. To perform longest prefix match efficiently in a set-associative array, we restrict routing table prefixes to a small number of lengths using a controlled prefix expansion technique. Since this inflates the routing tables, we use skewed associativity to increase the effective capacity of our devices. Compared to previous proposals, IPStash does not require any complicated routing table transformations but more importantly, it makes incremental updates to the routing tables effortless. The proposed architecture is also easily expandable. Our simulations show that IPStash is both fast and power efficient compared to TCAMs. Specifically, IPStash devices - built in the same technology as TCAMS - can run at speeds in excess of 600 MHz, offer more than twice the search throughput (>200Msps), and consume up to 35% less power (for the same throughput) than the best commercially available TCAMs when testes with real routing tables and IP traffic.

international conference on systems | 2009

Instruction-based reuse-distance prediction for effective cache management

Pavlos Petoumenos; Georgios Keramidas; Stefanos Kaxiras

The effect of caching is fully determined by the program locality or the data reuse and several cache management techniques try to base their decisions on the prediction of temporal locality in programs. However, prior work reports only rough techniques which either try to predict when a cache block loses its temporal locality or try to categorize cache items as highly or poorly temporal. In this work, we quantify the temporal characteristics of the cache block at run time by predicting the cache block reuse distances (measured in intervening cache accesses), based on the access patterns of the instructions (PCs) that touch the cache blocks. We show that an instruction-based reused distance predictor is very accurate and allows approximation of optimal replacement decisions, since we can “see” the future. We experimentally evaluate our prediction scheme in various sizes L2 caches using a subset of the most memory intensive SPEC2000 benchmarks. Our proposal obtains a significant improvement in terms of IPC over traditional LRU up to 130.6% (17.2% on average) and it also outperforms the previous state of the art proposal (namely Dynamic Insertion Policy or DIP) by up to 80.7% (15.8% on average).

high performance embedded architectures and compilers | 2007

Applying decay to reduce dynamic power in set-associative caches

Georgios Keramidas; Polychronis Xekalakis; Stefanos Kaxiras

In this paper, we propose a novel approach to reduce dynamic power in set-associative caches that leverages on a leakage-saving proposal, namely Cache Decay. We thus open the possibility to unify dynamic and leakage management in the same framework. The main intuition is that in a decaying cache, dead lines in a set need not be searched. Thus, rather than trying to predict which cache way holds a specific line, we predict, for each way, whether the line could be live in it. We access all the ways that possibly contain the live line and we call this way-selection. In contrast to way-prediction, way-selection cannot be wrong: the line is either in the selected ways or not in the cache. The important implication is that we have a fixed hit time -- indispensable for both performance and ease-of-implementation reasons. In order to achieve high accuracy, in terms of total ways accessed, we use Decaying Bloom filters to track only the live lines in ways -- dead lines are automatically purged. We offer efficient implementations of such autonomously Decaying Bloom filters, using novel quasi-static cells. Our prediction approach grants us high-accuracy in narrowing the choice of ways for hits as well as the ability to predict misses -- a known weakness of way-prediction.

international symposium on low power electronics and design | 2005

A simple mechanism to adapt leakage-control policies to temperature

Stefanos Kaxiras; Polychronis Xekalakis; Georgios Keramidas

Leakage power reduction in cache memories continues to be a critical area of research because of the promise of a significant pay-off. Various techniques have been developed so far that can be broadly categorized into state-preserving (e.g., drowsy caches) and nonstate preserving (e.g., cache decay). Decay saves more leakage but also incurs dynamic power overhead in the form of induced misses. Previous work has shown that depending on the leakage vs. dynamic power trade-off, one or the other technique can be better. Several factors such as cache architecture, technology parameters and temperature, affect this trade-off. Our work proposes the first mechanism - to the best of our knowledge - that takes into account temperature in adjusting the leakage control policy at run time. At very low temperatures, leakage is relatively weak so the need to tightly control it is not as important as the need to minimize extra dynamic power (e.g., decay-induced misses) or performance loss. We use a hybrid decay+drowsy policy where the main benefit comes from decaying cache lines while the drowsy mode is used to save leakage in long decay intervals. To adapt the decay mode to temperature, we propose a simple triggering mechanism that is based on the principles of decaying 4T thermal sensors and, as such, tied to temperature. The hotter the cache is, the faster cache lines are decayed since it is beneficial to do so with very high leakage currents. Conversely, when the cache temperature is low, our mechanism defers putting cache lines in decay mode to avoid dynamic power overhead but still saves a significant amount of leakage using the drowsy mode. Our study shows that across a wide range of temperatures, the simple adaptability of our proposal yields consistently better results than either the decay mode, or drowsy mode alone, improving over the best by as much as 33%.

applied reconfigurable computing | 2015

Robots in Assisted Living Environments as an Unobtrusive, Efficient, Reliable and Modular Solution for Independent Ageing: The RADIO Perspective

Christos P. Antonopoulos; Georgios Keramidas; Nikolaos S. Voros; Michael Hübner; Diana Göhringer; Maria Dagioglou; Theodoros Giannakopoulos; Stasinos Konstantopoulos; Vangelis Karkaletsis

Demographic and epidemiologic transitions in Europe have brought a new health care paradigm where life expectancy is increasing as well as the need for long-term care. To meet the resulting challenge, European healthcare systems need to take full advantage of new opportunities offered by technical advancements in ICT. The RADIO project explores a novel approach to user acceptance and unobtrusiveness: an integrated smart home/assistant robot system where health monitoring equipment is an obvious and accepted part of the user’s daily life. By using the smart home/assistant robot as sensing equipment for health monitoring, we mask the functionality of the sensors rather than the sensors themselves. In this manner, sensors do not need to be discrete and distant or masked and cumbersome to install; they do however need to be perceived as a natural component of the smart home/assistant robot functionalities.

computing frontiers | 2010

Where replacement algorithms fail: a thorough analysis

Georgios Keramidas; Pavlos Petoumenos; Stefanos Kaxiras

Cache placement and eviction, especially at the last level of the memory hierarchy, have received a flurry of research activity recently. The common perception that LRU is a well-performing algorithm has recently been discredited: many researchers have turned their attention to more sophisticated algorithms that are able to substantially improve cache performance. In this paper, we thoroughly examine four recently proposed replacement policies: the Dynamic Insertion Policy (DIP), the Shepherd Cache (SC), the MLP-aware replacement, and the Instruction-based Reuse Distance Prediction (IbRDP) replacement policy. Our experimental studies show that there is a great inconsistency between the number of misses saved by each mechanism and the resulting improvement in IPC. This is particularly true for the DIP and the SC approach and indeed attest to the fact that these algorithms do not take into account the relative cost of each miss (i.e., whether it is an isolated or parallel miss). Their aim is to blindly lower the total number of misses. On the other hand, the MLP-aware replacement, although miss-cost-aware, cannot handle efficiently workloads which display LRU-hostile behavior and thus fails to reduce execution time even when there are ample opportunities to reduce cache misses. The IbRDP replacement policy shows both the ability to deal with non-LRU access patterns and MLP friendliness leading to greater consistency between the reduction of misses and the corresponding increase in performance thus the largest IPC improvement among the studied mechanisms. So, what are the appropriate characteristics of a replacement algorithm targeting the lower levels of the memory hierarchy? In this paper we are shedding some light on this question.

Explore More