Santiago Pagani
Karlsruhe Institute of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Santiago Pagani.
international conference on hardware/software codesign and system synthesis | 2014
Santiago Pagani; Heba Khdr; Waqaas Munawar; Jian-Jia Chen; Muhammad Shafique; Minming Li; Jörg Henkel
Chip manufacturers provide the Thermal Design Power (TDP) for a specific chip. The cooling solution is designed to dissipate this power level. But because TDP is not necessarily the maximum power that can be applied, chips are operated with Dynamic Thermal Management (DTM) techniques. To avoid excessive triggers of DTM, usually, system designers also use TDP as power constraint. However, using a single and constant value as power constraint, e.g., TDP, can result in big performance losses in many-core systems. Having better power budgeting techniques is a major step towards dealing with the dark silicon problem. This paper presents a new power budget concept, called Thermal Safe Power (TSP), which is an abstraction that provides safe power constraint values as a function of the number of simultaneously operating cores. Executing cores at any power consumption below TSP ensures that DTM is not triggered. TSP can be computed offline for the worst cases, or online for a particular mapping of cores. Our simulations show that using TSP as power constraint results in 50.5% and 14.2% higher average performance, compared to using constant power budgets (both per-chip and per-core) and a boosting technique, respectively. Moreover, TSP results in dark silicon estimations which are more optimistic than estimations using constant power budgets.
design automation conference | 2015
Jörg Henkel; Heba Khdr; Santiago Pagani; Muhammad Shafique
This paper presents new trends in dark silicon reflecting, among others, the deployment of FinFETs in recent technology nodes and the impact of voltage/frquency scaling, which lead to new less-conservative predictions. The focus is on dark silicon from a thermal perspective: we show that it is not simply the chips total power budget, e.g., the Thermal Design Power (TDP), that leads to the dark silicon problem, but instead it is the power density and related thermal effects. We therefore propose to use Thermal Safe Power (TSP) as a more efficient power budget. It is also shown that sophisticated spatio-temporal mapping decisions result in improved thermal profiles with reduced peak temperatures. Moreover, we discuss the implications of Near-Threshold Computing (NTC) and employment of Boosting techniques in dark silicon systems.
design automation conference | 2015
Heba Khdr; Santiago Pagani; Muhammad Shafique; Jörg Henkel
In dark silicon chips, a significant amount of on-chip resources cannot be simultaneously powered on and need to stay dark, i.e., power gated, in order to avoid thermal emergencies. This paper presents a resource management technique, called DsRem, that selects the number of active cores jointly with their voltage/frequency (v/f) levels, considering the high Instruction Level Parallelism (ILP) or Thread Level Parallelism (TLP) nature of different applications, in order to maximize the overall system performance. DsRem leverages the positioning of dark cores, to efficiently dissipate the heat generated by the active cores. This facilitates increasing the v/f level of the active cores, which leads to further performance improvement. Compared to state-of-the-art thermal-aware task application mapping, DsRem achieves up to 46% performance gain, while avoiding any thermal emergencies. Additionally, DsRem outperforms the boosting technique with 26%.
design, automation, and test in europe | 2015
Santiago Pagani; Jian-Jia Chen; Muhammad Shafique; Jörg Henkel
In many core systems, run-time scheduling decisions, such as task migration, core activations/deactivations, voltage/frequency scaling, etc., are typically used to optimize the resource usages. Such run-time decisions change the power consumption, which can in turn result in transient temperatures much higher than any steady-state scenarios. Therefore, to be thermally safe, it is important to evaluate the transient peaks before making resource management decisions. This paper presents a method for computing these transient peaks in just a few milliseconds, which is suited for run-time usage. This technique works for any compact thermal model consisting in a system of first-order differential equations, for example, RC thermal networks. Instead of using regular numerical methods, our algorithm is based on analytically solving the differential equations using matrix exponentials and linear algebra. This results in a mathematical expression which can easily be analyzed and differentiated to compute the maximum transient temperatures. Moreover, our method can also be used to efficiently compute all transient temperatures for any given time resolution without accuracy losses. We implement our solution as an open-source tool called MatEx. Our experimental evaluations show that the execution time of MatEx for peak temperature computation can be bounded to no more than 2.5 ms for systems with 76 thermal nodes, and to no more than 26.6 ms for systems with 268 thermal nodes, which is three orders of magnitude faster than the state-of-the-art for the same settings.
ACM Transactions in Embedded Computing Systems | 2014
Santiago Pagani; Jian-Jia Chen
Energy-efficient designs are important issues in computing systems. This paper studies the energy efficiency of a simple and linear-time strategy, called Single Frequency Approximation (SFA) scheme, for periodic real-time tasks on multi-core systems with a shared supply voltage in a voltage island. The strategy executes all the cores at a single frequency to just meet the timing constraints. SFA has been adopted in the literature after task partitioning, but the worst-case performance of SFA, in terms of energy consumption, is an open problem. We provide comprehensive analysis for SFA to derive the cycle utilization distribution for its worst-case behaviour for energy minimization. Our analysis shows that the energy consumption by using SFA for task execution is at most 1.53 (1.74, 2.10, 2.69, respectively), compared to the energy consumption of the optimal voltage/frequency scaling, when the dynamic power consumption is a cubic function of the frequency and the voltage island has up to 4 (8, 16, 32, respectively) cores. The analysis shows that SFA is indeed an effective scheme under practical settings, even though it is not optimal. Furthermore, since all the cores run at a single frequency and no frequency alignment for Dynamic Voltage and Frequency Scaling (DVFS) between cores is needed, any uni-core dynamic power management technique for reducing the energy consumption for idling can be easily incorporated individually on each core in the voltage island. This paper also provides the analysis of energy consumption for SFA, combined with the procrastination for Dynamic Power Management (DPM). Furthermore, we also extend our analysis for deriving the approximation factor of SFA for a multi-core system with multiple voltage islands.
design automation conference | 2013
Janmartin Jahn; Santiago Pagani; Sebastian Kobbe; Jian-Jia Chen; Jörg Henkel
Efficiently utilizing the computational resources of many core systems is one of the most prominent challenges. The problem worsens when resource requirements vary unpredictably and applications may be started/stopped at any time. To address this challenge, we propose two schemes that calculate and adapt task mappings at runtime: a centralized, optimal mapping scheme and a distributed, hierarchical mapping scheme that trades optimality for a high degree of scalability. Experiments on Intels 48-core Single-Chip Cloud Computer and in a many core simulator show that a significant improvement in system performance can be achieved over current state-of-the-art.
real-time systems symposium | 2013
Santiago Pagani; Jian-Jia Chen
Energy-efficiency is a major concern in modern computing systems. For such systems, the presence of multiple voltage islands, where the voltage of each island can change independently and all cores in an island share the same supply voltage at any given time, is an expected compromise between global and per-core Dynamic Voltage and Frequency Scaling (DVFS). This paper focuses on energy minimization for a set of periodic tasks assigned on a voltage island. We present a simple and practical solution, that assigns the tasks onto cores in the island and then applies a DVFS schedule, particularly the Single Frequency Approximation (SFA) scheme. Furthermore, we provide thorough theoretical analysis of our solution, in terms of energy efficiency, against the optimal task partitioning and optimal DVFS schedule, especially for the state-of-the-art designs, that have a few number of cores per voltage island. The analysis shows that, our task partitioning scheme combined with SFA is a good and practical solution for energy efficiency. Particularly, when the number of cores in each voltage island is limited, the approximation factor is at most 2.01 (2.29, 2.55, 2.80, respectively) when the dynamic power consumption is a cubic function of the frequency and the islands have up to 4 (8, 16, 32, respectively) cores. Moreover, with non-negligible overhead for sleeping, further combination with any uni-core procrastination algorithm that consumes no more energy than keeping a core idle when it has no workload in its ready queue, increases the approximation factor by at most 1.
embedded and real-time computing systems and applications | 2013
Santiago Pagani; Jian-Jia Chen
Energy-efficient designs are important issues in computing systems. This paper studies the energy efficiency of a simple and linear-time strategy, called Single Frequency Approximation (SFA) scheme, for periodic real-time tasks on multi-core systems with a shared supply voltage in a voltage island. The strategy executes all the cores at a single frequency to just meet the timing constraints. SFA has been adopted in the literature after task partitioning, but the worst-case performance of SFA, in terms of energy consumption, is an open problem. We provide comprehensive analysis for SFA to derive the cycle utilization distribution for its worst-case behaviour for energy minimization. Our analysis shows that the energy consumption by using SFA for task execution is at most 1.53 (1.74, 2.10, 2.69, respectively), compared to the energy consumption of the optimal voltage/frequency scaling, when the dynamic power consumption is a cubic function of the frequency and the voltage island has up to 4 (8, 16, 32, respectively) cores. The analysis shows that SFA is indeed an effective scheme under practical settings, even though it is not optimal. Furthermore, since all the cores run at a single frequency and no frequency alignment for Dynamic Voltage and Frequency Scaling (DVFS) between cores is needed, any uni-core dynamic power management technique for reducing the energy consumption for idling can be easily incorporated individually on each core in the voltage island. This paper also provides the analysis of energy consumption for SFA, combined with the procrastination for Dynamic Power Management (DPM). Furthermore, we also extend our analysis for deriving the approximation factor of SFA for a multi-core system with multiple voltage islands.
IEEE Transactions on Parallel and Distributed Systems | 2015
Santiago Pagani; Jian-Jia Chen; Minming Li
Efficient and effective system-level power management for multi-core systems with multiple voltage islands is necessary for next-generation computing systems. This paper considers energy efficiency for such systems, in which the cores in the same voltage island have to be operated at the same supply voltage level. We explore how to map given task sets onto cores, so that each task set is assigned and executed on one core and the energy consumption is minimized. Due to the restriction to operate at the same supply voltage in a voltage island, different mappings will result in different energy consumptions. By using the simple single frequency approximation scheme (SFA) to decide the voltages and frequencies of individual voltage islands, this paper presents the approximation factor analysis (in terms of energy consumption) for simple heuristic algorithms, and develops a dynamic programming algorithm, which derives optimal mapping solutions for energy minimization when using SFA. We experimentally evaluate the running time and energy consumption performance of these algorithms on Intels single-chip cloud computer (SCC). Moreover, we conduct simulations for hypothetical platforms with different number of voltage islands and cores per island, also considering different task partitioning policies.
IEEE Transactions on Computers | 2017
Heba Khdr; Santiago Pagani; Ericles Rodrigues Sousa; Vahid Lari; Anuj Pathania; Frank Hannig; Muhammad Shafique; Jürgen Teich; Jörg Henkel
Increasing power densities have led to the dark silicon era, for which heterogeneous multicores with different power and performance characteristics are promising architectures. This paper focuses on maximizing the overall system performance under a critical temperature constraint for heterogeneous tiled multicores, where all cores or accelerators inside a tile share the same voltage and frequency levels. For such architectures, we present a resource management technique that introduces power density as a novel system level constraint, in order to avoid thermal violations. The proposed technique then assigns applications to tiles by choosing their degree of parallelism and the voltage/frequency levels of each tile, such that the power density constraint is satisfied. Moreover, our technique provides runtime adaptation of the power density constraint according to the characteristics of the executed applications, and reacting to workload changes at runtime. Thus, the available thermal headroom is exploited to maximize the overall system performance.