Hyungmin Cho | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Hyungmin Cho is active.

Explore More

Publication

Featured researches published by Hyungmin Cho.

design, automation, and test in europe | 2010

ERSA: error resilient system architecture for probabilistic applications

Larkhoon Leem; Hyungmin Cho; Jason Bau; Quinn Jacobson; Subhasish Mitra

There is a growing concern about the increasing vulnerability of future computing systems to errors in the underlying hardware. Traditional redundancy techniques are expensive for designing energy-efficient systems that are resilient to high error rates. We present Error Resilient System Architecture (ERSA), a low-cost robust system architecture for emerging killer probabilistic applications such as Recognition, Mining and Synthesis (RMS) applications. While resilience of such applications to errors in low-order bits of data is well-known, execution of such applications on error-prone hardware significantly degrades output quality (due to high-order bit errors and crashes). ERSA achieves high error resilience to high-order bit errors and control errors (in addition to low-order bit errors) using a judicious combination of 3 key ideas: (1) asymmetric reliability in many-core architectures, (2) error-resilient algorithms at the core of probabilistic applications, and (3) intelligent software optimizations. Error injection experiments on a multi-core ERSA hardware prototype demonstrate that, even at very high error rates of 20,000 errors/second/core or 2×10−4 error/cycle/core (with errors injected in architecturally-visible registers), ERSA maintains 90% or better accuracy of output results, together with minimal impact on execution time, for probabilistic applications such as K-Means clustering, LDPC decoding and Bayesian networks. Moreover, we demonstrate the effectiveness of ERSA in tolerating high rates of static memory errors that are characteristic of emerging challenges such as Vccmin problems and erratic bit errors. Using the concept of configurable reliability, ERSA platforms may also be adapted for general-purpose applications that are less resilient to errors (but at higher costs).

design automation conference | 2013

Quantitative evaluation of soft error injection techniques for robust system design

Hyungmin Cho; Shahrzad Mirkhani; Chen-Yong Cher; Jacob A. Abraham; Subhasish Mitra

Choosing the correct error injection technique is of primary importance in simulation-based design and evaluation of robust systems that are resilient to soft errors. Many low-level (e.g., flip-flop-level) error injection techniques are generally used for small systems due to long execution times and significant memory requirements. High-level error injections at the architecture or memory levels are generally fast but can be inaccurate. Unfortunately, there exists very little research literature on quantitative analysis of the inaccuracies associated with high-level error injection techniques. In this paper, we use simulation and emulation results to understand the accuracy tradeoffs associated with a variety of high-level error injection techniques. A detailed analysis of error propagation explains the causes of high degrees of inaccuracies associated with error injection techniques at higher levels of abstraction.

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems | 2012

ERSA: Error Resilient System Architecture for Probabilistic Applications

Hyungmin Cho; Larkhoon Leem; Subhasish Mitra

IEEE Transactions on Communications | 2013

Gallager B Decoder on Noisy Hardware

S. M. Sadegh Tabatabaei Yazdi; Hyungmin Cho; Lara Dolecek

Conventional communications theory assumes that the data transmission is noisy but the processing at the receiver is entirely error-free. Such assumptions may have to be revisited for advanced (silicon) technologies in which hardware failures are a major concern at the system-level. Hence, it is important to characterize the performance of a communication system with both noisy processing components and noisy data transmission. Coding systems based on low-density parity check (LDPC) codes are widely used for a variety of applications. In this paper, we focus on probabilistic analysis of the LDPC Gallager B decoder built out of faulty components. Using the density evolution technique, we find approximations for the optimal threshold of the decoder and the symbol error rate (SER) of the decoded sequence as functions of both the channel error rate and error rates of the decoder components, for both binary and non-binary regular LDPC codes. Furthermore, we study the convergence of the output SER and the decoding threshold of the decoder for different ranges of error rates. We verify our results using MATLAB simulations and hardware emulation of noisy decoders. Results presented in this paper can serve as systematic design guidelines in resource allocation for noisy decoders. Informed resource allocation is of particular relevance to emerging data storage and processing applications that need to maintain high levels of reliability despite hardware errors in advanced technologies.

languages, compilers, and tools for embedded systems | 2007

Dynamic data scratchpad memory management for a memory subsystem with an MMU

Hyungmin Cho; Bernhard Egger; Jaejin Lee; Heonshik Shin

In this paper, we propose a dynamic scratchpad memory (SPM)management technique for a horizontally-partitioned memory subsystem with an MMU. The memory subsystem consists of a relatively cheap direct-mapped data cache and SPM. Our technique loads required global data and stack pages into the SPM on demand when a function is called. A scratchpad memory managerloads/unloads the data pages and maintains a page table for the MMU. Our approach is based on post-pass analysis and optimization techniques, and it handles the whole program including libraries. The data page mapping is determined by solving an integer linear programming (ILP) formulation that approximates our demand paging technique. The ILP model uses a dynamic call graph annotated with the number of memory accesses and/or cache misses obtained by profiling. We evaluate our technique on thirteen embedded applications. We compare the results to a reference system with a 4-way set associative data cache and the ideal case with the same 4-way cache and SPM, where all global and stack data is placed in the SPM. On average, our approach reduces the total system energy consumption by 8.1% with no performance degradation. This is equivalent to exploiting 60% of the room available in energy reduction between the reference case and the ideal case.

international symposium on vlsi design, automation and test | 2014

The resilience wall: Cross-layer solution strategies

Subhasish Mitra; Pradip Bose; Eric Cheng; Chen-Yong Cher; Hyungmin Cho; Rajiv V. Joshi; Young Moon Kim; Charles R. Lefurgy; Yanjing Li; Kenneth P. Rodbell; Kevin Skadron; James H. Stathis; Lukasz G. Szafaryn

Resilience to hardware failures is a key challenge for a large class of future computing systems that are constrained by the so-called power wall: from embedded systems to supercomputers. Todays mainstream computing systems typically assume that transistors and interconnects operate correctly during useful system lifetime. With enormous complexity and significantly increased vulnerability to failures compared to the past, future system designs cannot rely on such assumptions. At the same time, there is explosive growth in our dependency on such systems. To overcome this outstanding challenge, this paper advocates and examines a cross-layer resilience approach. Two major components of this approach are: 1. System and software-level effects of circuit-level faults are considered from early stages of system design; and, 2. resilience techniques are implemented across multiple layers of the system stack - from circuit and architecture levels to runtime and applications - such that they work together to achieve required degrees of resilience in a highly energy-efficient manner. Illustrative examples to demonstrate key aspects of cross-layer resilience are discussed.

international conference on communications | 2012

Probabilistic analysis of Gallager B faulty decoder

S. M. Sadegh Tabatabaei Yazdi; Hyungmin Cho; Yifan Sun; Subhasish Mitra; Lara Dolecek

Todays mainstream electronic systems typically assume that transistors and interconnections operate correctly over their useful lifetime. For coming generations of silicon technologies, several causes of hardware failures, such as erratic bit errors, transient (soft) errors, and process variations, are becoming significant. In contrast to the traditional redundancy-based reliability solutions, the aim of a probabilistic design is to achieve high quality results and efficiency using erroneous or imperfect components along with a judicious allocation of resources. In this paper we focus on a probabilistic analysis of an LDPC Gallager B decoder made out of unreliable hardware components. Our analysis reveals the dependencies between the final BER at the output of the decoder and the errors in the components of the decoder. We demonstrate that a system design guided by our analysis can produce higher quality results compared to an arbitrary resource allocation. This resource allocation is of particular relevance to emerging storage applications that need to maintain extremely high levels of reliability even as the underlying technology scales deep into the nano-regime.

design automation conference | 2016

CLEAR: C ross -L ayer E xploration for A rchitecting R esilience - Combining hardware and software techniques to tolerate soft errors in processor cores

Eric Cheng; Shahrzad Mirkhani; Lukasz G. Szafaryn; Chen-Yong Cher; Hyungmin Cho; Kevin Skadron; Mircea R. Stan; Klas Lilja; Jacob A. Abraham; Pradip Bose; Subhasish Mitra

We present a first of its kind framework which overcomes a major challenge in the design of digital systems that are resilient to reliability failures: achieve desired resilience targets at minimal costs (energy, power, execution time, area) by combining resilience techniques across various layers of the system stack (circuit, logic, architecture, software, algorithm). This is also referred to as cross-layer resilience. In this paper, we focus on radiation-induced soft errors in processor cores. We address both single-event upsets (SEUs) and single-event multiple upsets (SEMUs) in terrestrial environments. Our framework automatically and systematically explores the large space of comprehensive resilience techniques and their combinations across various layers of the system stack (798 cross-layer combinations in this paper), derives cost-effective solutions that achieve resilience targets at minimal costs, and provides guidelines for the design of new resilience techniques. We demonstrate the practicality and effectiveness of our framework using two diverse designs: a simple, in-order processor core and a complex, out-of-order processor core. Our results demonstrate that a carefully optimized combination of circuit-level hardening, logic-level parity checking, and micro-architectural recovery provides a highly cost-effective soft error resilience solution for general-purpose processor cores. For example, a 50× improvement in silent data corruption rate is achieved at only 2.1% energy cost for an out-of-order core (6.1% for an in-order core) with no speed impact. However, selective circuit-level hardening alone, guided by a thorough analysis of the effects of soft errors on application benchmarks, provides a cost-effective soft error resilience solution as well (with ~1% additional energy cost for a 50× improvement in silent data corruption rate).

design automation conference | 2015

Understanding soft errors in uncore components

Hyungmin Cho; Chen-Yong Cher; Thomas Shepherd; Subhasish Mitra

The effects of soft errors in processor cores have been widely studied. However, little has been published about soft errors in uncore components, such as memory subsystem and I/O controllers, of a System-on-a-Chip (SoC). In this work, we study how soft errors in uncore components affect system-level behaviors. We have created a new mixed-mode simulation platform that combines simulators at two different levels of abstraction, and achieves 20,000× speedup over RTL-only simulation. Using this platform, we present the first study of the system-level impact of soft errors inside various uncore components of a large-scale, multi-core SoC using the industrial-grade, open-source OpenSPARC T2 SoC design. Our results show that soft errors in uncore components can significantly impact system-level reliability. We also demonstrate that uncore soft errors can create major challenges for traditional system-level checkpoint recovery techniques. To overcome such recovery challenges, we present a new replay recovery technique for uncore components belonging to the memory subsystem. For the L2 cache controller and the DRAM controller components of OpenSPARC T2, our new technique reduces the probability that an application run fails to produce correct results due to soft errors by more than 100× with 3.32% and 6.09% chip-level area and power impact, respectively.

international conference on computer aided design | 2010

Cross-layer error resilience for robust systems

Larkhoon Leem; Hyungmin Cho; Hsiao-Heng Lee; Young Moon Kim; Yanjing Li; Subhasish Mitra

A large class of robust electronic systems of the future must be designed to perform correctly despite hardware failures. In contrast, todays mainstream systems typically assume error-free hardware. Classical fault-tolerant computing techniques are too expensive for this purpose. This paper presents an overview of new techniques that can enable a sea change in the design of cost-effective robust systems. These techniques utilize globally-optimized cross-layer approaches, i.e., across device, circuit, architecture, runtime, and application layers, to overcome hardware failures.

Explore More