Damien Hardy
University of Rennes
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Damien Hardy.
euromicro conference on real-time systems | 2014
Jaume Abella; Damien Hardy; Isabelle Puaut; Eduardo Quiñones; Francisco J. Cazorla
Timing validation is a critical step in the design of real-time systems, that requires the estimation of Worst-Case Execution Times (WCET) for tasks. A number of different methods have been proposed, such as Static Deterministic Timing Analysis (SDTA). The advent of Probabilistic Timing Analysis, both Measurement-Based (MBPTA) and Static Probabilistic Timing Analyses (SPTA), offers different design points between the tightness of WCET estimates, hardware that can be analyzed and the information needed from the user to carry out the analysis. The lack of comparison among those techniques makes complex the selection of the most appropriate one for a given system. This paper makes a first attempt towards comparing comprehensively SDTA, SPTA and MBPTA, qualitatively and quantitatively, under different cache configurations implementing LRU and random replacement. We identify strengths and limitations of each technique depending on the characteristics of the program under analysis and the hardware platform, thus providing users with guidance on which approach to choose depending on their target application and hardware platform.
IEEE Micro | 2012
Boris Grot; Damien Hardy; Pejman Lotfi-Kamran; Babak Falsafi; Chrysostomos Nicopoulos; Yiannakis Sazeides
Performance and total cost of ownership (TCO) are key optimization metrics in large-scale data centers. According to these metrics, data centers designed with conventional server processors are inefficient. Recently introduced processors based on low-power cores can improve both throughput and energy efficiency compared to conventional server chips. However, a specialized Scale-Out Processor (SOP) architecture maximizes on-chip computing density to deliver the highest performance per TCO and performance per watt at the data-center level.
euromicro conference on real-time systems | 2007
Isabelle Puaut; Damien Hardy
Conventionally, the use of virtual memory in real-time systems has been avoided, the main reason being the difficulties it provides to timing analysis. However, there is a trend towards systems where different functions are implemented by concurrent processes. Such systems need spatial separation between processes, which can be easily implemented via the use of the memory management unit (MMU) of commercial processors. In addition, some systems have a limited amount of physical memory available. So far, attempts to provide real-time address spaces have focused on the predictability of virtual to physical address translation and do not implement demand-paging. In this paper we propose a compiler approach to introduce a predictable form of paging, in which page-in and page-out points are selected at compile-time. The problem under study can be formulated as a graph coloring problem, as in register allocation within compilers. Since the graph coloring problem is NP-complete for more than three colors, we define a heuristic, which in contrast to those used for register allocation, aim at minimizing worst-case performance instead of average-case performance. Experimental results applied on tasks code show that predictability does not come at the price of performance loss as compared to standard (dynamic) demand paging.
Journal of Systems Architecture | 2011
Damien Hardy; Isabelle Puaut
With the advent of increasingly complex hardware in real-time embedded systems (processors with performance enhancing features such as pipelines, caches, multiple cores), most embedded processors use a hierarchy of caches. While much research has been devoted to the prediction of Worst-Case Execution Times (WCETs) in the presence of a single level of cache (instruction caches, data caches, impact of cache replacement policies), very little research has focused on WCET estimations in the presence of cache hierarchies. In this paper, we propose a safe static instruction cache analysis method for multi-level caches. Variations of the method are presented to model different cache hierarchy management policies between cache levels: non-inclusive, inclusive and exclusive cache hierarchies. The method supports multiple replacement policies. The proposed method is experimented on medium-size benchmarks and a larger application. We show that the method is tight in the case of non-inclusive caches hierarchies and exclusive caches hierarchies, provided that all cache levels use the Least Recently Used (LRU) replacement policy. We further evaluate the additional pessimism when inclusion is enforced or when a non-LRU replacement policy is used.
international symposium on microarchitecture | 2012
Damien Hardy; Isidoros Sideris; Nikolas Ladas; Yiannakis Sazeides
This paper presents a first-order analytical model for determining the performance degradation caused by permanently faulty cells in architectural and non-architectural arrays. We refer to this degradation as the performance vulnerability factor (PVF). The study assumes a future where cache blocks with faulty cells are disabled resulting in less cache capacity and extra misses while faulty predictor cells are still used but cause additional mispredictions. For a given program run, random probability of permanent cell failure, and processor configuration, the model can rapidly provide the expected PVF as well as lower and upper PVF probability distribution bounds for an individual array or array combination. The model is used to predict the PVF for the three predictors and the last level cache, used in this study, for a wide range of cell failure rates. The analysis reveals that for cell failure rate of up to 1.5e-6 the expected PVF is very small. For higher failure rates the expected PVF grows noticeably mostly due to the extra misses in the last level cache. The expected PVF of the predictors remains small even at high failure rates but the PVF distribution reveals cases of significant performance degradation with a non-negligible probability. These results suggest that designers of future processors can leverage trade-offs between PVF and reliability to sustain area, performance and energy scaling. The paper demonstrates this approach by exploring the implications of different cell size on yield and PVF.
international conference on computer design | 2012
Dragomir Milojevic; Sachin Satish Idgunji; Djordje Jevdjic; Emre Özer; Pejman Lotfi-Kamran; Andreas Panteli; Andreas Prodromou; Chrysostomos Nicopoulos; Damien Hardy; Babak Falsari; Yiannakis Sazeides
We propose a power-efficient many-core server-on-chip system with 3D-stacked Wide I/O DRAM targeting cloud workloads in datacenters. The integration of 3D-stacked Wide I/O DRAM on top of a logic die increases available memory bandwidth by using dense and fast Through-Silicon Vias (TSVs) instead of off-chip IOs, enabling faster data transfers at much lower energy per bit. We demonstrate a methodology that includes full-system microarchitectural modeling and rapid virtual physical prototyping with emphasis on the thermal analysis. Our findings show that while executing CPU-centric benchmarks (e.g. SPECInt and Dhrystone), the temperature in the server-on-chip (logic+DRAM) is in the range of 175-200°C at a power consumption of less than 20W, exceeding the reliable operating bounds without any cooling solutions, even with embedded cores. However, with real cloud workloads, the power density in the server-on-chip remains much below the temperatures reached by the CPU-centric workloads as a result of much lower power burnt by memory-intensive cloud workloads. We show that such a server-on-chip system is feasible with a low-cost passive heat sink eliminating the need for a high-cost active heat sink with an attached fan, creating an opportunity for overall cost and energy savings in datacenters.
international symposium on performance analysis of systems and software | 2013
Damien Hardy; Marios Kleanthous; Isidoros Sideris; Ali G. Saidi; Emre Özer; Yiannakis Sazeides
In this paper, we present EETCO: an estimation and exploration tool that provides qualitative assessment of data center design decisions on Total-Cost-of-Ownership (TCO) and environmental impact. It can capture the implications of many parameters including server performance, power, cost, and Mean-Time-To-Failure (MTTF). The tool includes a model for spare estimation needed to account for server failures and performance variability. The paper describes the tool model and its implementation, and presents experiments that explore tradeoffs offered by different server configurations, performance variability, MTTF, 2D vs 3D-stacked processors, and ambient temperature. These experiments reveal, for the data center configurations used in this study, several opportunities for profit and optimization in the datacenter ecosystem: (i) servers with different computing performance and power consumption merit exploration to minimize TCO and the environmental impact, (ii) performance variability is desirable if it comes with a drastic cost reduction, (iii) shorter processor MTTF is beneficial if it comes with a moderate processor cost reduction, (iv) increasing by few degrees the ambient datacenter temperature reduces the environmental impact with a minor increase in the TCO and (v) a higher cost for a 3D-stacked processor with shorter MTTF and higher power consumption can be preferred, over a conventional 2D processor, if it offers a moderate performance increase.
euromicro conference on real-time systems | 2008
Damien Hardy; Isabelle Puaut
There is a need for using virtual memory in real-time applications: using virtual addressing provides isolation between concurrent processes; in addition, paging allows the execution of applications whose size is larger than main memory capacity, which is useful in embedded systems where main memory is expensive and thus scarce. However, virtual memory is generally avoided when developing real-time and embedded applications due to predictability issues. In this paper we propose a predictable paging system in which the page loading and page eviction points are selected at compile-time. The contents of main memory is selected using an Integer Linear Programming (ILP) formulation. Our approach is applied to code, static data and stack regions of individual tasks. We show that the time required for selecting memory contents is reasonable for all applications including the largest ones, demonstrating the scalability of our approach. Experimental results compare our approach with a previous one, based on graph coloring. It shows that quality of page allocation is generally improved, with an average improvement of 30% over the previous approach. Another comparison with a state-of-the-art demand-paging system shows that predictability does not come at the price of performance loss.
Real-time Systems | 2015
Damien Hardy; Isabelle Puaut
Semiconductor technology evolution suggests that permanent failure rates will increase dramatically with scaling, in particular for SRAM cells. While well known approaches such as error correcting codes exist to recover from failures and provide fault-free chips, they will not be affordable anymore in the future due to their growing cost. Consequently, other approaches like fine-grained disabling and reconfiguration of hardware elements (e.g. individual functional units or cache blocks) will become economically necessary. This fine-grained disabling will degrade performance compared to a fault-free execution. To the best of our knowledge, all static worst-case execution time (WCET) estimation methods assume fault-free processors. Their result is not safe anymore when fine-grained disabling of hardware components is used. In this paper we provide the first method that statically calculates a probabilistic WCET bound in the presence of permanent faults in instruction caches. The proposed method derives a probabilistic WCET bound for a program, cache configuration, and probability of cell failure. As our method relies on static analysis to bound the longest path, its probabilistic nature only stems from the probability that faults actually occur. Our method is computationally tractable because it does not require an exhaustive enumeration of all the possible combinations of faulty cache blocks. Experimental results show that it provides WCET estimates very close to, but never below, the method that derives probabilistic WCETs by enumerating all possible locations of faulty cache blocks. The proposed method not only allows to quantify the impact of permanent faults on WCET estimates, but, most importantly, can be used in architectural exploration frameworks to select the most appropriate fault management mechanisms and design parameters for current and future chip designs.
design, automation, and test in europe | 2016
Damien Hardy; Isabelle Puaut; Yiannakis Sazeides
Fine-grained disabling and reconfiguration of hardware elements (functional units, cache blocks) will become economically necessary to recover from permanent failures, whose rate is expected to increase dramatically in the near future. This fine-grained disabling will lead to degraded performance as compared to a fault-free execution. Until recently, all static worst-case execution time (WCET) estimations methods were assuming fault-free processors, resulting in unsafe estimates in the presence of faults. The first static WCET estimation technique dealing with the presence of permanent faults in instruction caches was proposed in [1]. This study probabilistically quantified the impact of permanent faults on WCET estimates. It demonstrated that the probabilistic WCET (pWCET) estimates of tasks increase rapidly with the probability of faults as compared to fault-free WCET estimates. In this paper, we show that very simple reliability mechanisms allow mitigating the impact of faulty cache blocks on pWCETs. Two mechanisms, that make part of the cache resilient to faults are analyzed. Experiments show that the gain in pWCET for these two mechanisms are on average 48% and 40% as compared to an architecture with no reliability mechanism.