Chen-Yong Cher | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Chen-Yong Cher is active.

Explore More

Publication

Featured researches published by Chen-Yong Cher.

international symposium on low power electronics and design | 2007

Thermal-aware task scheduling at the system software level

Jeonghwan Choi; Chen-Yong Cher; Hubertus Franke; Hendrik F. Hamann; Alan J. Weger; Pradip Bose

Power-related issues have become important considerations in current generation microprocessor design. One of these issues is that of elevated on-chip temperatures. This has an adverse effect on cooling cost and, if not addressed suitably, on chip reliability. In this paper we investigate the general trade-offs between temporal and spatial hot spot mitigation schemes and thermal time constants, workload variations and microprocessor power distributions. By leveraging spatial and temporal heat slacks, our schemes enable lowering of on-chip unit temperatures by changing the workload in a timely manner with Operating System(OS) and existing hardware support.

design automation conference | 2013

Quantitative evaluation of soft error injection techniques for robust system design

Hyungmin Cho; Shahrzad Mirkhani; Chen-Yong Cher; Jacob A. Abraham; Subhasish Mitra

Choosing the correct error injection technique is of primary importance in simulation-based design and evaluation of robust systems that are resilient to soft errors. Many low-level (e.g., flip-flop-level) error injection techniques are generally used for small systems due to long execution times and significant memory requirements. High-level error injections at the architecture or memory levels are generally fast but can be inaccurate. Unfortunately, there exists very little research literature on quantitative analysis of the inaccuracies associated with high-level error injection techniques. In this paper, we use simulation and emulation results to understand the accuracy tradeoffs associated with a variety of high-level error injection techniques. A detailed analysis of error propagation explains the causes of high degrees of inaccuracies associated with error injection techniques at higher levels of abstraction.

Ibm Journal of Research and Development | 2015

Active Memory Cube: A processing-in-memory architecture for exascale systems

Ravi Nair; Samuel F. Antao; Carlo Bertolli; Pradip Bose; José R. Brunheroto; Tong Chen; Chen-Yong Cher; Carlos H. Andrade Costa; J. Doi; Constantinos Evangelinos; Bruce M. Fleischer; Thomas W. Fox; Diego S. Gallo; Leopold Grinberg; John A. Gunnels; Arpith C. Jacob; P. Jacob; Hans M. Jacobson; Tejas Karkhanis; Choon Young Kim; Jaime H. Moreno; John Kevin Patrick O'Brien; Martin Ohmacht; Yoonho Park; Daniel A. Prener; Bryan S. Rosenburg; Kyung Dong Ryu; Olivier Sallenave; Mauricio J. Serrano; Patrick Siegl

Many studies point to the difficulty of scaling existing computer architectures to meet the needs of an exascale system (i.e., capable of executing

international solid-state circuits conference | 2010

A wire-speed power TM processor: 2.3GHz 45nm SOI with 16 cores and 64 threads

Charles L. Johnson; David H. Allen; Jeff Brown; Steve Vanderwiel; Russ Hoover; Heather D. Achilles; Chen-Yong Cher; George A. May; Hubertus Franke; Jimi Xenedis; Claude Basso

10^{18}

international symposium on microarchitecture | 2009

Temperature Variation Characterization and Thermal Management of Multicore Architectures

Eren Kursun; Chen-Yong Cher

floating-point operations per second), consuming no more than 20 MW in power, by around the year 2020. This paper outlines a new architecture, the Active Memory Cube, which reduces the energy of computation significantly by performing computation in the memory module, rather than moving data through large memory hierarchies to the processor core. The architecture leverages a commercially demonstrated 3D memory stack called the Hybrid Memory Cube, placing sophisticated computational elements on the logic layer below its stack of dynamic random-access memory (DRAM) dies. The paper also describes an Active Memory Cube tuned to the requirements of a scientific exascale system. The computational elements have a vector architecture and are capable of performing a comprehensive set of floating-point and integer instructions, predicated operations, and gather-scatter accesses across memory in the Cube. The paper outlines the software infrastructure used to develop applications and to evaluate the architecture, and describes results of experiments on application kernels, along with performance and power projections.

international conference on computer design | 2008

Variation-aware thermal characterization and management of multi-core architectures

Eren Kursun; Chen-Yong Cher

An emerging data-center market merges network and server attributes into a single wire-speed processor SoC. These processors are not network endpoints that consume data, but inline processors that filter or modify data and send it on. Wire-speed processors merge attributes from 1) network processors: many threaded low power cores, accelerators, integrated network and memory I/O, smaller memory line sizes and low total power, and from 2) server processors: full ISA cores, standard programming models, OS and hypervisor support, full virtualization and server RAS & infrastructure. Typical applications are edge-of-network processing, intelligent I/O devices in servers, network attached appliances, distributed computing, and streaming applications.

ieee international conference on high performance computing data and analytics | 2015

Understanding the propagation of transient errors in HPC applications

Rizwan A. Ashraf; Roberto Gioiosa; Gokcen Kestor; Ronald F. DeMara; Chen-Yong Cher; Pradip Bose

Increased variability affects the efficiency of dynamic power and thermal management. Existing on-chip sensor infrastructure can be used to improve the inherent thermal imbalances among cores in a multicore architecture. Experimental analysis based on live measurements on a special test chip shows reduced on-chip heating with no performance loss.

international conference on parallel architectures and compilation techniques | 2010

Power and thermal characterization of POWER6 system

Víctor Jiménez; Francisco J. Cazorla; Roberto Gioiosa; Mateo Valero; Carlos Boneti; Eren Kursun; Chen-Yong Cher; Canturk Isci; Alper Buyuktosunoglu; Pradip Bose

The accuracy and efficiency of dynamic power and thermal management are both affected by the increased levels of on-chip variation, mainly because dynamic thermal management schemes are oblivious to the variation characteristics of the underlying hardware. We propose a technique that utilizes the existing on-chip sensor infrastructure to improve the inherent thermal imbalances among different cores in a multi-core architecture. Thermal sensor readings are compiled to generate an on-chip variation map, which is provided to the system power/thermal management to effectively manage the existing on-chip variation. Experimental analysis based on live measurements on a special test-chip shows reduced on-chip heating with no performance loss, which improves the power/thermal efficiency of the chip at no cost.

international symposium on vlsi design, automation and test | 2014

The resilience wall: Cross-layer solution strategies

Subhasish Mitra; Pradip Bose; Eric Cheng; Chen-Yong Cher; Hyungmin Cho; Rajiv V. Joshi; Young Moon Kim; Charles R. Lefurgy; Yanjing Li; Kenneth P. Rodbell; Kevin Skadron; James H. Stathis; Lukasz G. Szafaryn

Resiliency of exascale systems has quickly become an important concern for the scientific community. Despite its importance, still much remains to be determined regarding how faults disseminate or at what rate do they impact HPC applications. The understanding of where and how fast faults propagate could lead to more efficient implementation of application-driven error detection and recovery. In this work, we propose a fault propagation framework to analyze how faults propagate in MPI applications and to understand their vulnerability to faults. We employ a combination of compiler-level code transformation and instrumentation, along with a runtime checker. Using the information provided by our framework, we employ machine learning technique to derive application fault propagation models that can be used to estimate the number of corrupted memory locations at runtime.

design automation conference | 2012

An information-theoretic framework for optimal temperature sensor allocation and full-chip thermal monitoring

Huapeng Zhou; Xin Li; Chen-Yong Cher; Eren Kursun; Haifeng Qian; Shi-Chune Yao

Controlling power consumption and temperature is of major concern for modern computing systems. In this work we characterize thermal behavior and power consumption of an IBM POWER6™-based system. We perform the characterization at several levels: application, operating system, and hardware level, both when the system is idle, and under load. At hardware level, we report a 25% reduction in total system power consumption by using the processor low power mode. We also study the effect of the hardware thread prioritization mechanism provided by POWER6 on different workloads and how this mechanism can be used to limit power consumption. At OS level, we analyze the power reduction techniques implemented in the Linux kernel, such as the tickless kernel and the CPU idle power manager. At application level, we characterize the power consumption and the temperature of two sets of benchmarks (METbench and SPEC CPU2006) and we study the effect of workload characteristics on power consumption and core temperature. From this characterization we derive a model based on performance counters that allows us to predict the total power consumption of the POWER6 system with an average error under 3% for CMP and 5% for SMT. To the best of our knowledge, this is the first power model of a system including CMP+SMT processors. Finally, we show that the static decision on whether to consolidate tasks into the same core/chip, as it is currently done in Linux, can be improved by dynamically considering the low-power capabilities of the underlying architecture and the characteristics of the application (up to 5X improvement in ED2P).

Explore More