Seokin Hong | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Seokin Hong is active.

Explore More

Publication

Featured researches published by Seokin Hong.

international symposium on microarchitecture | 2011

Residue cache: a low-energy low-area L2 cache architecture via compression and partial hits

Soontae Kim; Jesung Kim; Jongmin Lee; Seokin Hong

L2 cache memories are being adopted in the embedded systems for high performance, which, however, increases energy consumption due to their large sizes. We propose a low-energy low-area L2 cache architecture, which performs as well as the conventional L2 cache architecture with 53% less area and around 40% less energy consumption. This architecture consists of an L2 cache and a small cache called residue cache. L2 and residue cache lines are half sized of the conventional L2 cache lines. Well compressed conventional L2 cache lines are stored only in the L2 cache while other poorly compressed lines are stored in both the L2 and residue caches. Although many conventional L2 cache lines are not fully captured by the residue cache, most accesses to them do not incur misses because not all their words are needed immediately, which are termed as partial hits in this paper. The residue cache architecture consumes much lower energy and area than conventional L2 cache architectures, and can be combined synergistically with other schemes such as the line distillation and ZCA. The residue cache architecture is also shown to perform well on a 4-way superscalar processor typically used in high performance systems.

high-performance computer architecture | 2013

Macho: A failure model-oriented adaptive cache architecture to enable near-threshold voltage scaling

Tayyeb Mahmood; Soontae Kim; Seokin Hong

Recent interest in CMOS voltage scaling has produced a class of cache architectures which tolerate parametric SRAM failures at low voltage by substituting faulty words of one cache line with healthy words of another line. These caches rely on the fault maps (which grow reciprocally with smaller word sizes) for fault identification. Therefore, the benefits of cache voltage scaling must be rigorously investigated against the cost of their fault map overheads, especially in large caches. This paper reviews the word substitution caches and develops their parametric failure model. Our developed model leads to a non-intrusive and reconfigurable cache (Macho) which can be locally optimized (based on local fault density) by two graph-based algorithms. Specifically, our adaptive matching algorithm increases effective cache capacity by dynamically concentrating healthy cache blocks into active cache sets. Macho enables voltage scaling down to 400mV by tolerating high SRAM-failure rates (≥ 1%) and achieves better energy reduction (44%) than other substitution caches with similar area overheads.

high-performance computer architecture | 2013

Skinflint DRAM system: Minimizing DRAM chip writes for low power

Yebin Lee; Soontae Kim; Seokin Hong; Jongmin Lee

DRAMs are one of the main players of computer system energy consumption due to their large capacities and frequent accesses. Consequently, many schemes have been proposed to reduce DRAM power/energy consumption. Some of them propose new DRAM system and chip organizations, which are effective in reducing power consumption but intrusive. In contrast, we minimize DRAM write accesses at chip level with minimal modification of the conventional DRAM system organization and small addition to caches. When all data going to the same DRAM chips are not modified, the chips are not accessed. Consequently, chips are accessed selectively in our scheme while all chips are accessed simultaneously in the conventional DRAM system. Our chip-based selective DRAM write scheme is shown to reduce DRAM power and energy consumptions by 17% and 14%, respectively, on average. The overheads of our scheme are small in terms of performance, area, and energy consumption.

international conference on computer design | 2010

Lizard: Energy-efficient hard fault detection, diagnosis and isolation in the ALU

Seokin Hong; Soontae Kim

Digital circuits are expected to increasingly suffer from more hard faults due to technology scaling. Especially, a single hard fault in the ALU might lead to a total failure in the embedded systems. In addition, energy efficiency is critical in these systems. To address these increasingly important problems in the ALU, we propose a novel energy-efficient fault-tolerant ALU design called Lizard. Lizard utilizes two 16-bit ALUs to perform 32-bit computations with fault detection and diagnosis. By exploiting predictable operations, fault detection is performed in a single cycle. The 16-bit ALUs can be partitioned into two 8-bit ALUs. When a fault occurs in one of the four 8-bit ALUs, Lizard diagnoses and isolates a faulty 8-bit ALU for itself. After the faulty 8-bit ALU is isolated, Lizard continues its operation using the remaining three 8-bit ALUs, which can detect and isolate another fault. In this way, Lizard can survive faults on at most two sub-ALUs increasing its lifetime and fault tolerance. We conducted comparative evaluations with an unprotected ALU, triple modular redundancy ALU, and quadruple time redundancy ALU in terms of area, energy consumption, performance, and reliability. It is demonstrated that Lizard outperforms other ALU designs in most cases, especially in energy efficiency.

international conference on computer design | 2014

Ternary cache: Three-valued MLC STT-RAM caches

Seokin Hong; Jongmin Lee; Soontae Kim

Spin-transfer torque random access memory (STT-RAM) has become a promising non-volatile memory technology for cache memories. Recently, 2-bit multi-level cell (MLC) STT-RAM has been proposed to enhance data density, but it suffers from low reliability of its read and write operations. In this paper, we propose a novel cache design called Ternary cache. In Ternary cache, a memory cell can store three values (i.e., 0,1,2) while MLC STT-RAM can store four values. In this way, Ternary cache achieves much higher read stability than MLC STT-RAM-based caches. To enhance writability, a write operation is performed with high current and terminated as soon as the data is written. Evaluation results show that Ternary cache achieves the data density benefit of MLC STT-RAM and the reliability benefit of SLC STT-RAM.

ieee computer society annual symposium on vlsi | 2009

TEPS: Transient Error Protection Utilizing Sub-word Parallelism

Seokin Hong; Soontae Kim

Future microprocessors are expected to observe higher transient error rates in combinational logic due to technology scaling and dense integration. We propose a simple transient error protection mechanism for embedded systems exploiting frequent small operand values of instructions and frequently used shift operations. The conditions for applicable instructions for the proposed mechanism are explored, which account for 84% of total instructions executed on average. The operands of these instructions are replicated in ALU directly and other instructions are protected using time-redundant double execution. Our experimental results show that the proposed mechanism incurs 12% performance hit and 4% energy hit, on average, with a low impact on the chip area (7% of the execution unit area).

design, automation, and test in europe | 2013

AVICA: an access-time variation insensitive L1 cache architecture

Seokin Hong; Soontae Kim

Ever scaling process technology increases variations in transistors. The process variations cause large fluctuations in the access times of SRAM cells. Caches made of those SRAM cells cannot be accessed within the target clock cycle time, which reduces yield of processors. To combat these access time failures in caches, many schemes have been proposed, which are, however, limited in their coverage and do not scale well at high failure rates. We propose a new L1 cache architecture (AVICA) employing asymmetric pipelining and pseudo multi-banking. Asymmetric pipelining eliminates all access time failures in L1 caches. Pseudo multi-banking minimizes the performance impact of asymmetric pipelining. For further performance improvement, architectural techniques are proposed. Our experimental results show that our proposed L1 cache architecture incurs less than 1% performance hit compared to the conventional cache architecture with no access time failure. Our proposed architecture is not sensitive to access time failure rates and has low overheads compared to the previously proposed competitive schemes.

high-performance computer architecture | 2017

Partial Row Activation for Low-Power DRAM System

Yebin Lee; Hyeonggyu Kim; Seokin Hong; Soontae Kim

Owing to increasing demand of faster and larger DRAM system, the DRAM system accounts for a large portion of the total power consumption of computing systems. As memory traffic and DRAM bandwidth grow, the row activation and I/O power consumptions are becoming major contributors to total DRAM power consumption. Thus, reducing row activation and I/O power consumptions has big potential for improving the power and energy efficiency of the computing systems. To this end, we propose a partial row activation scheme for memory writes, in which DRAM is re-architected to mitigate row overfetching problem of modern DRAMs and to reduce row activation power consumption. In addition, accompanying I/O power consumption in memory writes is also reduced by transferring only a part of cache line data that must be written to partially opened rows. In our proposed scheme, partial rows ranging from a one-eighth row to a full row can be activated to minimize row activation granularity for memory writes and the full bandwidth of the conventional DRAM can be maintained for memory reads. Our partial row activation scheme is shown to reduce total DRAM power consumption by up to 32% and 23% on average, which outperforms previously proposed schemes in DRAM power saving with almost no performance loss.

international symposium on low power electronics and design | 2011

TLB index-based tagging for cache energy reduction

Jongmin Lee; Seokin Hong; Soontae Kim

Conventional cache tag matching is based on addresses to identify correct data in caches. However, this tagging scheme is not efficient because tag bits are unnecessarily large. From our observations, there are not many unique tag bits due to typically small working sets, which are conventionally captured by TLBs. To effectively exploit this fact, we propose TLB index-based cache tagging scheme. This new tagging scheme reduces required number of tag bits to one-fourth of the conventional tagging scheme. The reduced tag bits decrease tag bits array area by 72% and its energy consumption by 58%. From our experiments, our proposed new tagging scheme reduces instruction cache energy consumption by 13% for embedded systems.

IEEE Transactions on Computers | 2015

A Low-Cost Mechanism Exploiting Narrow-Width Values for Tolerating Hard Faults in ALU

Seokin Hong; Soontae Kim

Digital circuits are expected to increasingly suffer from more hard faults due to technology scaling. Especially, a single hard fault in ALU (Arithmetic Logic Unit) might lead to a total failure in processors or significantly reduce their performance. To address these increasingly important problems, we propose a novel cost-efficient fault-tolerant mechanism for the ALU, called LIZARD. LIZARD employs two half-word ALUs, instead of a single full-word ALU, to perform computations with concurrent fault detection. When a fault is detected, the two ALUs are partitioned into four quarter-word ALUs. After diagnosing and isolating a faulty quarter-word ALU, LIZARD continues its operation using the remaining ones, which can detect and isolate another fault. Even though LIZARD uses narrow ALUs for computations, it adds negligible performance overhead through exploiting predictability of the results in the arithmetic computations. We also present the architectural modifications when employing LIZARD for scalar as well as superscalar processors. Through comparative evaluation, we demonstrate that LIZARD outperforms other competitive fault-tolerant mechanisms in terms of area, energy consumption, performance and reliability.

Explore More