Chang-Hong Hsu | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Chang-Hong Hsu is active.

Explore More

Publication

Featured researches published by Chang-Hong Hsu.

high-performance computer architecture | 2015

Adrenaline: Pinpointing and reining in tail queries with quick voltage boosting

Chang-Hong Hsu; Yunqi Zhang; Michael A. Laurenzano; David Meisner; Thomas F. Wenisch; Jason Mars; Lingjia Tang; Ronald G. Dreslinski

Reducing the long tail of the query latency distribution in modern warehouse scale computers is critical for improving performance and quality of service of workloads such as Web Search and Memcached. Traditional turbo boost increases a processors voltage and frequency during a coarse-grain sliding window, boosting all queries that are processed during that window. However, the inability of such a technique to pinpoint tail queries for boosting limits its tail reduction benefit. In this work, we propose Adrenaline, an approach to leverage finer granularity, 10s of nanoseconds, voltage boosting to effectively rein in the tail latency with query-level precision. Two key insights underlie this work. First, emerging finer granularity voltage/frequency boosting is an enabling mechanism for intelligent allocation of the power budget to precisely boost only the queries that contribute to the tail latency; and second, per-query characteristics can be used to design indicators for proactively pinpointing these queries, triggering boosting accordingly. Based on these insights, Adrenaline effectively pinpoints and boosts queries that are likely to increase the tail distribution and can reap more benefit from the voltage/frequency boost. By evaluating under various workload configurations, we demonstrate the effectiveness of our methodology. We achieve up to a 2.50x tail latency improvement for Memcached and up to a 3.03x for Web Search over coarse-grained DVFS given a fixed boosting power budget. When optimizing for energy reduction, Adrenaline achieves up to a 1.81x improvement for Memcached and up to a 1.99x for Web Search over coarse-grained DVFS.

international symposium on computer architecture | 2016

Dynamo: facebook's data center-wide power management system

Qiang Wu; Qingyuan Deng; Lakshmi Ganesh; Chang-Hong Hsu; Yun Jin; Sanjeev Kumar; Bin Li; Justin Meza; Yee Jiun Song

Data center power is a scarce resource that often goes underutilized due to conservative planning. This is because the penalty for overloading the data center power delivery hierarchy and tripping a circuit breaker is very high, potentially causing long service outages. Recently, dynamic server power capping, which limits the amount of power consumed by a server, has been proposed and studied as a way to reduce this penalty, enabling more aggressive utilization of provisioned data center power. However, no real at-scale solution for data center-wide power monitoring and control has been presented in the literature. In this paper, we describe Dynamo -- a data center-wide power management system that monitors the entire power hierarchy and makes coordinated control decisions to safely and efficiently use provisioned data center power. Dynamo has been developed and deployed across all of Facebooks data centers for the past three years. Our key insight is that in real-world data centers, different power and performance constraints at different levels in the power hierarchy necessitate coordinated data center-wide power management. We make three main contributions. First, to understand the design space of Dynamo, we provide a characterization of power variation in data centers running a diverse set of modern workloads. This characterization uses fine-grained power samples from tens of thousands of servers and spanning a period of over six months. Second, we present the detailed design of Dynamo. Our design addresses several key issues not addressed by previous simulation-based studies. Third, the proposed techniques and design have been deployed and evaluated in large scale data centers serving billions of users. We present production results showing that Dynamo has prevented 18 potential power outages in the past 6 months due to unexpected power surges, that Dynamo enables optimizations leading to a 13% performance boost for a production Hadoop cluster and a nearly 40% performance increase for a search cluster, and that Dynamo has already enabled an 8% increase in the power capacity utilization of one of our data centers with more aggressive power subscription measures underway.

asia and south pacific design automation conference | 2011

A robust ECO engine by resource-constraint-aware technology mapping and incremental routing optimization

Shao-Lun Huang; Chi-An Wu; Kai-Fu Tang; Chang-Hong Hsu; Chung-Yang Huang

ECO re-mapping is a key step in functional ECO tools. It implements a given patch function on a layout database with a limited spare cell resource. Previous ECO re-mapping algorithms are based on existing technology mappers. However, these mappers are not designed to consider the resource limitation and thus the corresponding ECO results are generally not good enough, or even become much worse when the spare cells are sparse. In this paper, we proposed a new solution for ECO remapping. It includes a robust resource-constraint-aware technology mapper and a fast incremental router for wire-length optimization. Moreover, we adopt a Pseudo-Boolean solver to search feasible solutions when the spare cells are sparse. Our experimental results show that our ECO engine can outperform the previous tool in both runtime and routing costs. We also demonstrate the robustness of our tool by performing ECOs on various spare cell limitations.

computer and communications security | 2014

Verifying Curve25519 Software

Yu-Fang Chen; Chang-Hong Hsu; Hsin-Hung Lin; Peter Schwabe; Ming-Hsien Tsai; Bow-Yaw Wang; Bo-Yin Yang; Shang-Yi Yang

This paper presents results on formal verification of high-speed cryptographic software. We consider speed-record-setting hand-optimized assembly software for Curve25519 elliptic-curve key exchange presented by Bernstein et al. at CHES 2011. Two versions for different microarchitectures are available. We successfully verify the core part of the computation, and reproduce detection of a bug in a previously published edition. An SMT solver supporting array and bit-vector theories is used to establish almost all properties. Remaining properties are verified in a proof assistant with simple rewrite tactics. We also exploit the compositionality of Hoare logic to address the scalability issue. Essential differences between both versions of the software are discussed from a formal-verification perspective.

IEEE Internet Computing | 2017

Thermal Time Shifting: Decreasing Data Center Cooling Costs with Phase-Change Materials

Matt Skach; Manish Arora; Chang-Hong Hsu; Qi Li; Dean M. Tullsen; Lingjia Tang; Jason Mars

As data centers increase in size and computational capacity, their growth comes at a cost: an increasing thermal load that must be removed to prevent overheating. Here, the authors propose using phase-change materials (PCMs) to shape a data centers thermal load, absorbing and releasing heat when its advantageous. They evaluate three important opportunities for cost savings. They find that in a data center, PCM can reduce the necessary cooling system size by up to 12 percent without impacting peak throughput, or increase the number of servers by up to 14.6 percent without increasing the cooling load. In a thermally constrained setting, PCM can increase peak throughput up to 69 percent while delaying the onset of thermal limits by over 3 hours, and a wax-aware scheduler enables up to a 11 percent reduction in peak cooling load when batch jobs are added, increasing average daily throughput by 36-52 percent.

international symposium on computer architecture | 2015

Thermal time shifting: leveraging phase change materials to reduce cooling costs in warehouse-scale computers

Matt Skach; Manish Arora; Chang-Hong Hsu; Qi Li; Dean M. Tullsen; Lingjia Tang; Jason Mars

Datacenters, or warehouse scale computers, are rapidly increasing in size and power consumption. However, this growth comes at the cost of an increasing thermal load that must be removed to prevent overheating and server failure. In this paper, we propose to use phase changing materials (PCM) to shape the thermal load of a datacenter, absorbing and releasing heat when it is advantageous to do so. We present and validate a methodology to study the impact of PCM on a datacenter, and evaluate two important opportunities for cost savings. We find that in a datacenter with full cooling system subscription, PCM can reduce the necessary cooling system size by up to 12% without impacting peak throughput, or increase the number of servers by up to 14.6% without increasing the cooling load. In a thermally constrained setting, PCM can increase peak throughput up to 69% while delaying the onset of thermal limits by over 3 hours.

international conference on computer aided design | 2010

Formal deadlock checking on high-level SystemC designs

Chun-Nan Chou; Chang-Hong Hsu; Yueh-Tung Chao; Chung-Yang Huang

One of the main purposes to use SystemC in system development is to perform system-level verification in the early design stage. However, simulation is still by far the only available solution for the high-level SystemC design verification. Nonetheless, traditional formal verification techniques, which rely on the translation of designs under verification to logic netlists, cannot be easily adopted here due to the concurrent/asynchronous nature and the abundant synthesis flexibilities of the high-level designs. In this paper, we propose a multi-layer modeling to represent the highlevel SystemC designs. By representing the different aspects of the design with different structures - simulation kernel, predictive synchronization dependence graph (PSDG), and extended Petri net (extPN), our modeling can be very concise and faithfully capture the original design semantics. We develop a formal verification engine on this modeling for the deadlock checks. With various novel ideas to enable the symbolic simulation, bounded model checking (BMC) and invariant checking techniques to work on high-level, our experimental results demonstrate the robustness and effectiveness of the formal deadlock checking on high-level SystemC designs.

design, automation, and test in europe | 2014

ArChiVED: Architectural checking via event digests for high performance validation

Chang-Hong Hsu; Debapriya Chatterjee; Ronny Morad; Raviv Ga; Valeria Bertacco

Simulation-based techniques play a key role in validating the functional correctness of microprocessor designs. A common approach for validating microprocessors (called instruction-by-instruction, or IBI checking) consists of running a RTL and an architectural simulation in lock-step, while comparing processor architectural state at each instruction retirement. This solution, however, cannot be deployed on long regression tests, because of the limited performance of RTL simulators. Acceleration platforms have the performance power to overcome this issue, but are not amenable to the deployment of an IBI checking methodology. Indeed, validation on these platforms requires logging activity on-platform and then checking it against a golden model off-platform. Unfortunately, an IBI checking approach following this paradigm entails a large slowdown for the acceleration platform, because of the sizable amount of data that must be transferred off-platform for comparison against the golden model. In this work we propose a sequence-by-sequence (SBS) checking approach that is efficient and practical for acceleration platforms. Our solution validates the test execution over sequences of instructions (instead of individual ones), thus greatly reducing the amount of data transferred for off-platform checking. We found that SBS checking delivers the same bug-detection accuracy as traditional IBI checking, while reducing the amount of traced data by more than 90%.

international symposium on microarchitecture | 2017

DeftNN: addressing bottlenecks for DNN execution on GPUs via synapse vector elimination and near-compute data fission

Parker Hill; Animesh Jain; Mason Hill; Babak Zamirai; Chang-Hong Hsu; Michael A. Laurenzano; Scott A. Mahlke; Lingjia Tang; Jason Mars

Deep neural networks (DNNs) are key computational building blocks for emerging classes of web services that interact in real time with users via voice, images and video inputs. Although GPUs have gained popularity as a key accelerator platform for deep learning workloads, the increasing demand for DNN computation leaves a significant gap between the compute capabilities of GPU-enabled datacenters and the compute needed to service demand. The state-of-the-art techniques to improve DNN performance have significant limitations in bridging the gap on real systems. Current network pruning techniques remove computation, but the resulting networks map poorly to GPU architectures, yielding no performance benefit or even slowdowns. Meanwhile, current bandwidth optimization techniques focus on reducing off-chip band-width while overlooking on-chip bandwidth, a key DNN bottleneck. To address these limitations, this work introduces DeftNN, a GPU DNN execution framework that targets the key architectural bottlenecks of DNNs on GPUs to automatically and transparently improve execution performance. DeftNN is composed of two novel optimization techniques–(l) synapse vector elimination, a technique that identifies non-contributing synapses in the DNN and carefully transforms data and removes the computation and data movement of these synapses while fully utilizing the GPU to improve performance, and (2) near-compute data fission, a mechanism for scaling down the on-chip data movement requirements within DNN computations. Our evaluation of DeftNN spans 6 state-of-the-art DNNs. By applying both optimizations in concert, DeftNN is able to achieve an average speedup of

ACM Transactions on Computer Systems | 2017