Xin Zhan | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Xin Zhan is active.

Explore More

Publication

Featured researches published by Xin Zhan.

design automation conference | 2013

Techniques for energy-efficient power budgeting in data centers

Xin Zhan; Sherief Reda

We propose techniques for power budgeting in data centers, where a large power budget is allocated among the servers and the cooling units such that the aggregate performance of the entire center is maximized. Maximizing the performance for a given power budget automatically maximizes the energy efficiency. We first propose a method to partition the total power budget among the cooling and computing units in a self-consistent way, where the cooling power is sufficient to extract the heat of the computing power. Given the computing power budget, we devise an optimal computing budgeting technique based on knapsack-solving algorithms to determine the power caps for the individual servers. The optimal computing budgeting technique leverages a proposed on-line throughput predictor based on performance counter measurements to estimate the change in throughput of heterogeneous workloads as a function of allocated server power caps. We set up a simulation environment for a data center, where we simulate the air flow and heat transfer within the center using computational fluid dynamic simulations to derive accurate cooling estimates. The power estimates for the servers are derived from measurements on a real server executing heterogeneous workload sets. Our budgeting method delivers good improvements over previous power budgeting techniques.

IEEE Transactions on Computers | 2015

Power Budgeting Techniques for Data Centers

Xin Zhan; Sherief Reda

The development of cloud computing and data science result in rapid increases of number and scale of data centers. Because of cost and sustainability concerns, energy efficiency has been a major goal for data center architects. Focusing on reducing the cooling power and making full use of available computing power, power budgeting is an increasingly important requirement for data center operations. In this paper, we present a framework of power budgeting, considering both computing power and cooling power, in data centers to maximize the system normalized performance (SNP) of the entire center under a total power budget. Maximizing the SNP for a given power budget is equivalent to maximizing the energy efficiency. We propose a method to partition the total power budget among the cooling and computing infrastructure in a self-consistent way, where the cooling power is sufficient to extract the heat of the computing power. Intertwinedly, we devise an optimal computing power budgeting technique based on dynamic programming algorithm to determine the optimal power caps for the individual servers such that the available power could be efficiently translated to performance improvements. The optimal computing budgeting technique leverages a proposed online throughput predictor based on performance counter measurements to estimate the change in throughput of heterogeneous workloads as a function of allocated server power caps. We demonstrate that our proposed power budgeting method outperforms previous methods by 3-4 percent in terms of SNP using our data center simulation environment. While maintaining the improvement of SNP, our method improve fairness at best by 57 percent. We also evaluate the performance of our method in power saving scenario and dynamic power budgeting case.

ieee international symposium on workload characterization | 2015

How Good Are Low-Power 64-Bit SoCs for Server-Class Workloads?

Reza Azimi; Xin Zhan; Sherief Reda

Emerging system-on-a-chip (SoC)-based microservers promise higher energy efficiency by drastically reducing power consumption albeit at the expense of loss in performance. In this paper we thoroughly evaluate the performance and energy efficiency of two 64-bit eight-core ARM and x86 SoCs on a number of parallel scale-out benchmarks and high-performance computing benchmarks. We characterize the workloads on these servers and elaborate the impact of the SoC architecture, memory hierarchy, and system design on the performance and energy efficiency outcomes. We also contrast the results against those of standard x86 servers.

international symposium on low power electronics and design | 2014

Thermal-aware layout planning for heterogeneous datacenters

Reza Azimi; Xin Zhan; Sherief Reda

Cooling power represents a significant portion of total power consumption in datacenters. Heterogeneous datacenters deploy clusters of servers with different hardware configurations, each offering its own performance and power characteristics. We observe that heterogeneous datacenters offer a unique opportunity to reduce cooling power through appropriate planning. In this paper we formulate the problem of rack layout for planning of heterogeneous datacenters, where the goal is to identify the best locations of the server racks with different hardware capabilities to improve the supply temperatures of the CRAC units and the total cooling power. We provide optimal solutions that take into account the impact of varying utilizations of datacenters and job scheduling methods. Using state-of-the-art thermal modeling tools, we prove that our methods lead to datacenter layouts with significant improvements in cooling power reduction, between 15.5% - 38.5% based on the datacenter utilizations and an average of 28.3% without any negative side effects.

high-performance computer architecture | 2017

Fast Decentralized Power Capping for Server Clusters

Reza Azimi; Masoud Badiei; Xin Zhan; Na Li; Sherief Reda

Power capping is a mechanism to ensure that the power consumption of clusters does not exceed the provisioned resources. A fast power capping method allows for a safe over-subscription of the rated power distribution devices, provides equipment protection, and enables large clusters to participate in demand-response programs. However, current methods have a slow response time with a large actuation latency when applied across a large number of servers as they rely on hierarchical management systems. We propose a fast decentralized power capping (DPC) technique that reduces the actuation latency by localizing power management at each server. The DPC method is based on a maximum throughput optimization formulation that takes into account the workloads priorities as well as the capacity of circuit breakers. Therefore, DPC significantly improves the cluster performance compared to alternative heuristics. We implement the proposed decentralized power management scheme on a real computing cluster. Compared to state-of-the-art hierarchical methods, DPC reduces the actuation latency by 72% up to 86% depending on the cluster size. In addition, DPC improves the system throughput performance by 16%, while using only 0.02% of the available network bandwidth. We describe how to minimize the overhead of each local DPC agent to a negligible amount. We also quantify the traffic and fault resilience of our decentralized power capping approach.

IEEE Computer Architecture Letters | 2017

CARB: A C-State Power Management Arbiter for Latency-Critical Workloads

Xin Zhan; Reza Azimi; Svilen Kanev; David M. Brooks; Sherief Reda

Latency-critical workloads in datacenters have tight response time requirements to meet service-level agreements (SLAs). Sleep states (c-states) enable servers to reduce their power consumption during idle times; however entering and exiting c-states is not instantaneous, leading to increased transaction latency. In this paper we propose a c-state arbitration technique, CARB, that minimizes response time, while simultaneously realizing the power savings that could be achieved from enabling c-states. CARB adapts to incoming request rates and processing times and activates the smallest number of cores for processing the current load. CARB reshapes the distribution of c-states and minimizes the latency cost of sleep by avoiding going into deep sleeps too often. We quantify the improvements from CARB with memcached running on an 8-core Haswell-based server.

ieee international symposium on workload characterization | 2016

Power-aware characterization and mapping of workloads on CPU-GPU processors

Kapil Dev; Xin Zhan; Sherief Reda

Modern CPU-GPU processors enable workloads to run on both CPU and GPU devices. Current scheduling practices mainly use the characteristics of kernel workloads to decide the CPU/GPU mapping. We observe that runtime conditions such as power and CPU load also affect the mapping decision. Consequently, in this paper, we propose techniques to characterize the OpenCL kernel workloads during run-time and map them on appropriate device under time-varying physical (i.e., chip power limit) and CPU load conditions, in particular the number of available CPU cores for the OpenCL kernel. We implement our Power-Aware Scheduler (PAS) on a real CPU-GPU processor and evaluate it using various OpenCL benchmarks. Compared to the state-of-the-art kernel-level scheduling method, the proposed scheduler provides average improvements of 31% and 4% in runtime and energy, respectively.

cluster computing and the grid | 2016

DiBA: Distributed Power Budget Allocation for Large-Scale Computing Clusters

Masoud Badiei; Xin Zhan; Reza Azimi; Sherief Reda; Na Li

Power management has become a central issue inlarge-scale computing clusters where a considerable amount ofenergy is consumed and a large operational cost is incurredannually. Traditional power management techniques have a centralizeddesign that creates challenges for scalability of computingclusters. In this work, we develop a framework for distributedpower budget allocation that maximizes the utility of computingnodes subject to a total power budget constraint. To eliminate the role of central coordinator in the primaldualtechnique, we propose a distributed power budget allocationalgorithm (DiBA) which maximizes the combined performanceof a cluster subject to a power budget constraint in a distributedfashion. Specifically, DiBA is a consensus-based algorithm inwhich each server determines its optimal power consumptionlocally by communicating its state with neighbors (connectednodes) in a cluster. We characterize a synchronous primal-dualtechnique to obtain a benchmark for comparison with thedistributed algorithm that we propose. We demonstrate numericallythat DiBA is a scalable algorithm that outperforms theconventional primal-dual method on large scale clusters in termsof convergence time. Further, DiBA eliminates the communicationbottleneck in the primal-dual method. We thoroughly evaluatethe characteristics of DiBA through simulations of large-scaleclusters. Furthermore, we provide results from a proof-of-conceptimplementation on a real experimental cluster.

cluster computing and the grid | 2016

Creating Soft Heterogeneity in Clusters Through Firmware Re-configuration

Xin Zhan; Mohammed Shoaib; Sherief Reda

Customizing server hardware to adapt to its workload has the potential to improve both runtime and energy efficiency. In a cluster that caters to diverse workloads, employing servers with customized hardware components leads to heterogeneity, which is not scalable. In this paper, we seek to create soft heterogeneity from existing servers with homogenous hardware components through customizing the firmware configuration. We demonstrate that firmware configurations have a large impact on runtime, power, and energy efficiency of workloads. Since finding the firmware configuration that minimizes runtime and/or energy efficiency grows exponentially as a function of the number of firmware settings, we propose a methodology called FXplore that helps complete the exploration with a quadratic time complexity. Furthermore, FXplore enables system administrators to manage the degree of the heterogeneity by deriving firmware configurations for sub-clusters that can cater to multiple workloads with similar characteristics. Thus, during online operation, incoming workloads to the cluster can be mapped to appropriate sub-clusters with pre-configured firmware settings. FXplore also finds the best firmware settings in case of co-runners on the same server. We validate our methodology on a fully-instrumented cluster under a large range of parallel workloads that are representative of both high-performance compute clusters and datacenters. Compared to enabling all firmware options, our method improves average runtime and energy consumption by 11% and 15%, respectively.

Journal of Low Power Electronics | 2017