Fulya Kaplan | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Fulya Kaplan is active.

Explore More

Publication

Featured researches published by Fulya Kaplan.

2013 International Green Computing Conference Proceedings | 2013

Optimizing communication and cooling costs in HPC data centers via intelligent job allocation

Fulya Kaplan; Jie Meng; Ayse Kivilcim Coskun

Nearly half of the energy in the computing clusters today is consumed by the cooling infrastructure. It is possible to reduce the cooling cost by allowing the data center temperatures to rise; however, component reliability constraints impose thermal thresholds as failure rates are exponentially dependent on the processor temperatures. Existing thermally-aware job allocation policies optimize the cooling costs by minimizing the peak inlet temperatures of the server nodes. An important constraint in high performance computing (HPC) data centers, however, is performance. Specifically, HPC data centers run multi-threaded applications with significant communication among the threads. Thus, performance of such applications is strongly affected by the job allocation decisions. This paper proposes a novel job allocation methodology to jointly minimize communication cost of an HPC application while also reducing the cooling energy. The proposed method also considers temperature-dependent hardware reliability as part of the optimization.

international conference on computer design | 2014

Modeling and analysis of Phase Change Materials for efficient thermal management

Fulya Kaplan; Charlie De Vivero; Samuel Howes; Manish Arora; Houman Homayoun; Wayne Burleson; Dean M. Tullsen; Ayse Kivilcim Coskun

Direct placement of Phase Change Materials (PCMs) on the chip has been recently explored as a passive temperature management solution. PCMs provide the ability to store large amounts of heat at a close-to-constant temperature during the phase change (solid to liquid and vice versa). This latent heat capacity can be used to provide higher performance while reducing hot spots. Detailed modeling of the phase change behavior is essential for the design and evaluation of systems with PCM. This paper proposes an accurate phase change model that is integrated into the commonly used thermal simulation tool, HotSpot. It also provides validation of the proposed model by carrying out computational fluid dynamics (CFD) simulations using COMSOL Multiphysics®. This paper also explores the impact of PCM properties on the thermal profile of a processor, and demonstrates that PCM material choices can affect peak temperatures by up to 20.1°C. Experimental results show that dynamic policy decisions change dramatically when using the proposed detailed phase change model, as prior simpler PCM models can substantially over/under-estimate temperature and PCM melting duration. The proposed model helps design more effective dynamic management policies and enables realistic evaluation of systems with PCM.

Journal of Parallel and Distributed Computing | 2016

Communication and cooling aware job allocation in data centers for communication-intensive workloads

Jie Meng; Eduard Llamosí; Fulya Kaplan; Chulian Zhang; Jiayi Sheng; Martin C. Herbordt; Gunar Schirner; Ayse Kivilcim Coskun

Energy consumption is an increasingly important concern in data centers. Today, nearly half of the energy in data centers is consumed by the cooling infrastructure. Existing policies on thermally-aware workload allocation do not consider applications that include many tasks (or threads) running on a large set of nodes with significant communication among the tasks. Such jobs, however, constitute most of the cycles in high performance computing (HPC) domain, and have started to appear in other data centers as well. Job allocation strongly affects the performance of such communication-intensive applications. Communication-aware job allocation methods exist, but they focus solely on performance and do not consider cooling energy. This paper proposes a novel job allocation methodology to jointly minimize communication cost and cooling energy consumption in data centers. We formulate and solve the joint optimization problem using binary quadratic programming. Our joint optimization algorithm reduces cooling energy by 16.4 % on average with only a 2.66 % average increase in application running time compared to solely performance-aware allocations. To further optimize the communication cost, we develop a Charm++ based framework that extracts the communication behavior of applications. We then integrate our job allocation policy with recursive coordinate bisection (RCB) based task mapping method to place highly-communicating tasks in close proximity. Experimental results show that task mapping further decreases the communication cost by up to 20.9 % compared to assuming all-to-all communication, a popular assumption in much of the prior work. We jointly optimize the cooling and communication costs via job allocation.Our joint allocation strategy saves 16.4% cooling energy on average.We design a framework to extract the communication patterns of HPC applications.Combining joint allocation with task mapping reduces communication costs by 20.9%.

intersociety conference on thermal and thermomechanical phenomena in electronic systems | 2017

Fast thermal modeling of liquid, thermoelectric, and hybrid cooling

Fulya Kaplan; Sherief Reda; Ayse Kivilcim Coskun

Localized hot spots result in elevated on-chip temperatures, significantly limiting the performance, energy efficiency and reliability of todays processors. For efficient removal of hot spots, using superlattice-based thermoelectric coolers (TECs) in cooperation with microchannel liquid cooling has recently been explored. For the design and evaluation of such hybrid cooling solutions, fast and accurate modeling is essential. In this paper, we present a modeling methodology to account for the complex thermal behavior of TECs and liquid microchannels using compact thermal modeling (CTM). CTM provides a desirable tradeoff between accuracy and speed; thus, it is usually preferred over computationally heavy multi-physics simulations when designing and evaluating thermal management techniques. In this paper, we first describe how to jointly model liquid microchannels and TECs for a hybrid cooling design. We then validate the accuracy of our models by comparing the temperatures obtained from them against the temperatures from COMSOL and 3D-ICE. In comparison to COMSOL, the proposed model provides an average error of 2.07°C for TECs and 0.36°C for liquid microchannels, while providing four orders of magnitude faster simulation time. Compared to 3D-ICE, the proposed liquid cooling model achieves 0.02° Č7 average error. We point out challenges related to integrating a new cooling model into an existing compact thermal simulator and show that modeling decisions can affect the reported temperature by up to 20°C. We conclude our paper by demonstrating an example use case of our proposed model, where we compare the cooling performance of a hybrid cooling design involving microchannels and TECs against a design that adopts liquid cooling only.

2012 IEEE/IFIP 20th International Conference on VLSI and System-on-Chip (VLSI-SoC) | 2012

Topology-aware reliability optimization for multiprocessor systems

Jie Meng; Fulya Kaplan; Ming-yu Hsieh; Ayse Kivilcim Coskun

High on-chip temperatures adversely affect the reliability of processors, and reliability has become a serious concern as high performance computing moves towards exascale. While dynamic thermal management techniques can effectively constrain the chip temperature, most prior work has focused on temperature and reliability optimization of a single processor. In this work, we propose a topology-aware workload allocation policy to optimize the reliability of multi-chip multicore systems at runtime. Our results show that the proposed policy improves the system reliability by up to 123.3% compared to existing temperature balancing policies when systems have medium to high utilization. We also demonstrate that the policy is scalable to larger systems and its performance overhead is minimal.

international symposium on low power electronics and design | 2018

Design Optimization of 3D Multi-Processor System-on-Chip with Integrated Flow Cell Arrays

Artem Aleksandrovich Andreev; Fulya Kaplan; Marina Zapater; Ayse Kivilcim Coskun; David Atienza

Integrated flow cell array (FCA) is an emerging technology, targeting the cooling and power delivery challenges of modern 2D/3D Multi-Processor Systems-on-Chip (MPSoCs). In FCA, electrolytic solutions are pumped through microchannels etched in the silicon of the chips, removing heat from the system, while, at the same time, generating power on-chip. In this work, we explore the impact of FCA system design on various 3D architectures and propose a methodology to optimize a 3D MPSoC with integrated FCA to run a given workload in the most energy-efficient way. Our results show that an optimized configuration can save up to 50% energy with respect to sub-optimal 3D MPSoC configurations.

ieee acm international symposium cluster cloud and grid computing | 2017

Unveiling the Interplay Between Global Link Arrangements and Network Management Algorithms on Dragonfly Networks

Fulya Kaplan; Ozan Tuncer; Vitus J. Leung; K. Scott Hemmert; Ayse Kivilcim Coskun

Network messaging delay historically constitutes a large portion of the wall-clock time for High Performance Computing (HPC) applications, as these applications run on many nodes and involve intensive communication among their tasks. Dragonfly network topology has emerged as a promising solution for building exascale HPC systems owing to its low network diameter and large bisection bandwidth. Dragonfly includes local links that form groups and global links that connect these groups via high bandwidth optical links. Many aspects of the dragonfly network design are yet to be explored, such as the performance impact of the connectivity of the global links, i.e., global link arrangements, the bandwidth of the local and global links, or the job allocation algorithm. This paper first introduces a packet-level simulation framework to model the performance of HPC applications in detail. The proposed framework is able to simulate known MPI (message passing interface) routines as well as applications with custom-defined communication patterns for a given job placement algorithm and network topology. Using this simulation framework, we investigate the coupling between global link bandwidth and arrangements, communication pattern and intensity, job allocation and task mapping algorithms, and routing mechanisms in dragonfly topologies. We demonstrate that by choosing the right combination of system settings and workload allocation algorithms, communication overhead can be decreased by up to 44%. We also show that circulant arrangement provides up to 15% higher bisection bandwidth compared to the other arrangements, but for realistic workloads, the performance impact of link arrangements is less than 3%.

international symposium on low power electronics and design | 2015

Adaptive sprinting: How to get the most out of Phase Change based passive cooling

Fulya Kaplan; Ayse Kivilcim Coskun

CMOS scaling trends lead to elevated on-chip temperatures, which substantially limit the performance of todays processors. To improve thermal efficiency, Phase Change Materials (PCMs) have recently been used as passive cooling solutions. PCMs store large amount of heat at near-constant temperature during phase change, allowing strategies such as computational sprinting. While existing sprinting methods allow short performance boosts, there is significant unexplored potential in improving performance on systems with PCM-enhanced cooling. To this end, this paper proposes a novel runtime management policy driven by observations that are not captured by prior techniques: (i) PCM melts non-uniformly due to spatially heterogeneous on-chip heat distribution; (ii) power consumption during sprinting is highly application dependent and assuming a fixed sprinting power leads to lower thermal efficiency; (iii) if we monitor the remaining PCM energy at various locations, we can utilize the PCM heat storage capability much more efficiently. The proposed Adaptive Sprinting policy exploits these observations to extend sprinting duration for increased performance gains. Our policy monitors the remaining PCM energy corresponding to each core at runtime, and using this information, it decides on the number, the location and the voltage-frequency (V/f) setting of the sprinting cores. Experimental evaluation including a detailed phase change thermal model demonstrates 29% performance improvement, 22% energy savings, and 43% energy delay product (EDP) reduction on average, compared to prior strategies.

ASME 2015 International Technical Conference and Exhibition on Packaging and Integration of Electronic and Photonic Microsystems collocated with the ASME 2015 13th International Conference on Nanochannels, Microchannels, and Minichannels | 2015

Experimental Validation of a Detailed Phase Change Model on a Hardware Testbed

Charlie De Vivero; Fulya Kaplan; Ayse Kivilcim Coskun

Continued CMOS scaling accompanied with a stall in the voltage scaling has led to high on-chip power densities. High on-chip power densities elevate the temperatures, substantially limiting the performance and reliability of computing systems. The use of Phase Change Materials (PCMs)1 has been explored as a passive cooling method to manage excessive chip temperatures. The thermal properties of PCMs allow a large amount of heat to be stored at near-constant temperature during the phase transition. This heat storage capability of PCM can be leveraged during periods of intense computation. For systems with PCM, development of new management strategies is essential to maximize the benefits of PCM. In order to design and evaluate these management strategies, it is necessary to have an accurate PCM thermal model. In our recent work, we proposed a detailed phase change thermal model, which we integrated into a compact thermal simulation tool, HotSpot. In this paper, we build a hardware testbed incorporating a PCM unit on top of the chip package. We then validate the accuracy of our previously proposed thermal model by comparing the HotSpot simulation results against the measurements on the testbed. We observe that the error between the measured and simulated temperatures is less than 4°C with 0.65 probability. Finally, we implement a soft PCM capacity sensor that monitors the remaining PCM latent heat capacity to be used for development of thermal management policies. We evaluate a set of thermal management policies on the testbed. We compare policies that adjust the sprinting frequency based on current temperature against the policies that take action based on the remaining PCM capacity.Copyright

Sustainable Computing: Informatics and Systems | 2015