Torsten Wilde
Bavarian Academy of Sciences and Humanities
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Torsten Wilde.
high performance computing systems and applications | 2014
Torsten Wilde; Axel Auweter; Michael K. Patterson; Hayk Shoukourian; Herbert Huber; Arndt Bode; Detlef Labrenz; Carlo Cavazzoni
To determine whether a High-Performance Computing (HPC) data center is energy efficient, various aspects have to be taken into account: the data centers power distribution and cooling infrastructure, the HPC system itself, the influence of the system management software, and the HPC workloads; all can contribute to the overall energy efficiency of the data center. Currently, two well-established metrics are used to determine energy efficiency for HPC data centers and systems: Power Usage Effectiveness (PUE) and FLOPS per Watt (as defined by the Green500 in their ranking list). PUE evaluates the overhead for running a data center and FLOPS per Watt characterizes the energy efficiency of a system running the High-Performance Linpack (HPL) benchmark, i.e. floating point operations per second achieved with 1 watt of electrical power. Unfortunately, under closer examination even the combination of both metrics does not characterize the overall energy efficiency of a HPC data center. First, HPL does not constitute a representative workload for most of todays HPC applications and the rev 0.9 Green500 run rules for power measurements allows for excluding subsystems (e.g. networking, storage, cooling). Second, even a combination of PUE with FLOPS per Watt metric neglects that the total energy efficiency of a system can vary with the characteristics of the data center in which it is operated. This is due to different cooling technologies implemented in HPC systems and the difference in costs incurred by the data center removing the heat using these technologies. To address these issues, this paper introduces the metrics system PUE (sPUE) and Data center Workload Power Efficiency (DWPE). sPUE calculates the overhead for operating a given system in a certain data center. DWPE is then calculated by determining the energy efficiency of a specific workload and dividing it by the sPUE. DWPE can then be used to define the energy efficiency of running a given workload on a specific HPC system in a specific data center and is currently the only fully-integrated metric suitable for rating an HPC data centers energy efficiency. In addition, DWPE allows for predicting the energy efficiency of different HPC systems in existing HPC data centers, thus making it an ideal approach for guiding HPC system procurement. This paper concludes with a demonstration of the application of DWPE using a set of representative HPC workloads.
international conference on high performance computing and simulation | 2015
Hayk Shoukourian; Torsten Wilde; Axel Auweter; Arndt Bode
Efficient scheduling is crucial for time and cost-effective utilization of compute resources especially for high end systems. A variety of factors need to be considered during the scheduling decisions. Power variation across the compute resources of homogeneous large-scale systems has not been considered so far. This paper discusses the impact of the power variation for parallel application scheduling. It addresses the problem of finding the optimal resource configuration for a given application that will minimize the amount of consumed energy, under pre-defined constraints on application execution time and instantaneous average power consumption. This paper presents an efficient algorithm to do so, which also considers the existing power diversity among the compute nodes (modified also at different operating CPU frequencies) of a given homogeneous High Performance Computing system. Based on this algorithm, the paper presents a plug-in, referred to as Configuration Adviser, which operates on top of a given resource management and scheduling system to advise on energy-wise optimal resource configuration for a given application, execution using which, will adhere to the specified execution time and power consumption constraints. The main goal of this plug-in is to enhance the current resource management and scheduling tools for the support of power capping for future Exascale systems, where a data center might not be able to provide cooling or electrical power for system peak consumption but only for the expected power bands.
Journal of Parallel and Distributed Computing | 2017
Hayk Shoukourian; Torsten Wilde; Herbert Huber; Arndt Bode
Abstract SuperMUC, deployed at the Leibniz Supercomputing Centre, is the first High-Temperature (ASHRAE W4 chiller-less) DirectLiquid Cooled (HT-DLC) Petascale supercomputer installed worldwide. Chiller-less direct liquid cooling can save data centers a substantial amount of energy by reducing data center cooling overheads. An essential question remains unanswered — how to determine an optimal operational environment for balancing scientific discovery with the energy consumption of both the supercomputer and the cooling infrastructure? This paper shows, for the first time, how the new technologies (HT-DLC and chiller-less cooling) influence the performance and energy/power efficiency of large-scale HPC applications and how different inlet temperatures affect the overall system power consumption and the HT-DLC efficiency.
Archive | 2017
Tanja Clees; Nils Hornung; Detlef Labrenz; Michael Schnell; Horst Schwichtenberg; Hayk Shoukourian; Inna Torgovitskaia; Torsten Wilde
Cooling as well as heating circuits can be modeled as a network of elements that obey mass, momentum, and energy balance laws. Typical elements in such circuits are pipes, regulated pumps, regulated (multi-way) valves, and energy exchangers. Since cooling or heating can need a lot of energy, one is interested in understanding, reducing, and re-using energy flows. Supercomputing centers provide one important class of applications here. This article provides a detailed case study of a real system for which measurements and technical data are available. We briefly discuss our overall MYNTS framework for modeling, simulation, and optimization of such circuits. In more detail, we explain by means of a case study how we obtain and combine the network topology, element characteristics, and measurement data in order to set up and validate simulation models. Numerical results are presented and discussed. The case study is complemented in Clees et al. (Cooling Circuit Simulation I: Modeling. Springer, Berlin, 2017) (see pages 61–79 in this book) by a general introduction to the underlying physical model and its numerical treatment.
2017 33rd Thermal Measurement, Modeling & Management Symposium (SEMI-THERM) | 2017
Torsten Wilde; Michael Ott; Axel Auweter; Ingmar Meijer; Patrick Ruch; Markus Hilger; Steffen Kuhnert; Herbert Huber
In High Performance Computing (HPC), chiller-less cooling has replaced mechanical chiller supported cooling for a significant part of the HPC system resulting in lower cooling costs. Still, other IT components and IT systems remain that require air or cold water cooling. This work introduces CooLMUC-2, a high-temperature direct-liquid cooled (HT-DLC) HPC system which uses a heat-recovery scheme to drive an adsorption refrigeration process. Using an adsorption chiller is at least two times more efficient than a mechanical chiller for producing needed cold water. To this date this is the only installation of adsorption chillers in a data center combining a Top500 production level HPC system with adsorption refrigeration. This prototype installation is one more step towards a 100% mechanical chiller-free data center. After optimization of the operational parameters of the system, the adsorption chillers of CooLMUC-2 consume just over 6kW of electrical power to not only remove 95kW of heat from the supercomputer, but also to produce more than 50kW of cold water. This paper presents initial measurements characterizing the heat-recovery performance of CooLMUC-2 at different operating conditions.
intersociety conference on thermal and thermomechanical phenomena in electronic systems | 2017
Michael Ot; Torsten Wilde; Herbert Ruber
Since mechanical chillers account for a large fraction of electricity spent on cooling, data centers are looking for ways to reduce their usage. In High Performance Computing, a popular approach is to use readily available high temperature direct liquid cooling (HT-DLC) that allows for mechanical chiller free cooling of compute components. Additionally, it provides a means to re-use their waste heat. A potential application is to use the waste heat to produce still needed cold water via adsorption refrigeration. This paper analyses the first production level installation of adsorption technology in a data center in terms of energy flows, Return of Investment (ROI), and Total Cost of Ownership (TCO).
international parallel and distributed processing symposium | 2017
Hayk Shoukourian; Torsten Wilde; Detlef Labrenz; Arndt Bode
Power consumption continues to remain a critical aspect for High Performance Computing (HPC) data centers. It becomes even more crucial for Exascale computing since scaling todays fastest system to an Exaflop level would consume more than 168 MW power which is 8 times higher than the 20 MW power consumption goal set, at the time of this publication, by the US Department of Energy. This naturally leads to a necessity for energy efficiency improvement that will encompass the full chain of the power consumers, starting from the data center infrastructure, including cooling overheads and electrical losses, up to compute resource scheduling and application scaling. In this paper a machine learning approach is proposed to model the Coefficient of Performance (COP) of HPC data centers hot water cooling loop. The suggested model is validated on operational data obtained at Leibniz Supercomputing Centre (LRZ). The paper shows how this COP model can help to improve the energy efficiency of modern HPC data centers.
international conference on high performance computing and simulation | 2017
Matthias Maiterth; Torsten Wilde; David K. Lowenthal; Barry Rountree; Martin Schulz; Jonathan M. Eastep; Dieter Kranzlmiiller
Power and energy consumption are seen of one of the most critical design factor for any next generation large-scale HPC system. The price centers have to pay for energy is shifting the budgets from investment to operating costs, leading to scenarios in which the sizes of systems will be determined by their power needs, rather by the initial hardware cost. As a consequence, virtually all funding agencies for HPC projects around the world have set aggressive goals for peak power requirements in future machines. Yet, with todays HPC architectures and systems, these goals are still far out of reach: they will only be achievable through a complex set of mechanisms at all levels of hardware and software, from buildings and infrastructure to software control and all the way to microarchitectural solutions. All of these mechanisms will ultimately impact the application developer. On future HPC systems, running a code efficiently (as opposed to purely with high performance) will be a major requirement for every user. This work accompanies the tutorial “Power Aware High Performance Computing: Challenges and Opportunities for Application and system Developers” and captures the key aspects discussed. We will review existing literature to discuss the challenges caused by power and energy constraints, present available approaches in hardware and software, highlight impacts on HPC center and infrastructure design as well as operations, and ultimately show how this shift in paradigm from “cycle awareness” to “power awareness” will impact application development.
international conference on supercomputing | 2014
Axel Auweter; Arndt Bode; Matthias Brehm; Luigi Brochard; Nicolay Hammer; Herbert Huber; Raj Panda; Francois Thomas; Torsten Wilde
Computer Science - Research and Development | 2014
Torsten Wilde; Axel Auweter; Hayk Shoukourian