Jong Wook Kwak | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jong Wook Kwak is active.

Explore More

Publication

Featured researches published by Jong Wook Kwak.

Microprocessors and Microsystems | 2010

Design and implementation of Performance Analysis Unit (PAU) for AXI-based multi-core System on Chip (SOC)

Hyun-min Kyung; Gi-Ho Park; Jong Wook Kwak; Tae-Jin Kim; Sung-Bae Park

With the rapid development of semiconductor technology, more complicated systems have been integrated into single chips. However, system performance is not increased in proportion to the gate-count of the system. This is mainly because the optimized design of the system becomes more difficult as the systems become more complicated. Therefore, it is essential to understand the internal behavior of the system and utilize the system resources effectively in the System on Chip (SOC) design. In this paper, we design a Performance Analysis Unit (PAU) for monitoring the AMBA Advanced eXtensible Interface (AXI) bus as a mechanism to investigate the internal and dynamic behavior of an SOC, especially for internal bus activities. A case study with the PAU for an H.264 decoder application is also presented to show how the PAU is utilized in SOC platform. The PAU has the capability to measure major system performance metrics, such as bus latency, amount of bus traffic, contention between master/slave devices, and bus utilization for specific durations. This paper also presents a distributor and synchronization method to connect multiple PAUs to monitor multiple internal buses of large SOC.

Iet Computers and Digital Techniques | 2008

High-performance embedded branch predictor by combining branch direction history and global branch history

Jong Wook Kwak; Chu Shik Jhon

To achieve higher performance in embedded systems, recent embedded microprocessor cores have gradually taken to adopting the technologies of general high-performance microprocessor cores. In branch prediction techniques, usually, the embedded microprocessor cores have used simple bimodal branch predictors. That is, until now, most branch predictors in embedded processor cores have utilised the address of the branch instruction (program counter, PC), and recently some predictors in advanced embedded cores use dynamic branch predictor with global branch history (GBH). The authors suggest branch direction history (BDH) as a new component of the input vector for branch prediction. Additionally, a new embedded branch predictor is proposed, called direction-gshare predictor, which utilises BDH information, as an implementation example. In simulation parts, a neural network with three branch prediction input vectors (PC, GBH and BDH) is modelled and their actual impact upon the branch prediction accuracy is analysed. Then, the new embedded branch predictor, the direction-gshare predictor is simulated. The simulation results show that the aliasings in pattern history table are reduced, 48.9% on average, by the additional use of BDH information. Moreover, the direction-gshare predictor outperforms previous embedded branch predictors, such as bimodal predictor, two-level adaptive predictor and gshare predictor, up to 15.32%, 5.41% and 5.74%, respectively.

hawaii international conference on system sciences | 2010

Selective Access to Filter Cache for Low-Power Embedded Systems

Jong Wook Kwak; Ju Hee Choi

Filter cache has been introduced as one solution of reducing cache power consumption. When the filter cache is utilized in a memory system, more than 50% of the power reduction is accomplished due to the filter cache. However, more than 20% of the performance is compromised as well. To minimize the performance loss of the filter cache, this paper proposes a new filter cache predictor model and its algorithm. In our scheme, Mode Selection Bit (MSB) controls selective accesses to a filter cache and a Branch Target Buffer (BTB). The simulation result shows that our solution provides performance improvement, in energy-delay product, up to about 9.1%, compared to previous policies.

research in adaptive and convergent systems | 2014

Adaptive cache compression for non-volatile memories in embedded system

Ju Hee Choi; Jong Wook Kwak; Seong Tae Jhang; Chu Shik Jhon

Cache compression has been studied to increase the effective cache size by storing the cache blocks in a compressed form in the cache. However, it also generates additional write operations during the compressing and compacting of cache blocks. Since increasing the write operations leads to a surging of dynamic energy consumption and a shortening of the lifetime of the cache in the Non-Volatile Memory (NVM) based Last-Level Cache (LLC), it is needed to balance the extra write operations against the performance improvement. In this paper, we identify that cache compression is not always efficient for NVM-based LLC. In light of the analysis, we propose Adaptive Cache Compression for NVM (ACCNVM) whose cache block is only compressed when cache compression is efficient. The result shows that our proposal achieves a 16.4% energy savings and a 19.1% lifetime extension with respect to the cache, which uses a state-of-the-art cache compression scheme.

international conference on embedded computer systems architectures modeling and simulation | 2005

First-level instruction cache design for reducing dynamic energy consumption

Cheol Hong Kim; Sunghoon Shim; Jong Wook Kwak; Sung Woo Chung; Chu Shik Jhon

Microarchitects should consider energy consumption, together with performance, when designing instruction cache architecture, especially in embedded processors. This paper proposes a power-aware instruction cache architecture, named Partitioned Instruction Cache (PI-Cache), to reduce dynamic energy consumption in the instruction cache. The proposed PI-Cache is composed of several small sub-caches. When the PI-Cache is accessed, only one sub-cache is accessed by utilizing the locality of applications. In the meantime, the other sub-caches are not accessed, resulting in dynamic energy reduction. The PI-Cache also reduces energy consumption by eliminating energy consumed in tag matching. Moreover, performance loss is little, considering the physical cache access time. We evaluated the energy efficiency by running cycle accurate simulator, SimpleScalar, with power parameters obtained from CACTI. Simulation results show that the PI-Cache reduces dynamic energy consumption by 42% – 59%.

IEICE Transactions on Information and Systems | 2005

The Impact of Branch Direction History Combined with Global Branch History in Branch Prediction

Jong Wook Kwak; Ju Hwan Kim; Chu Shik Jhon

Most branch predictors use the PC information of the branch instruction and its dynamic Global Branch History (GBH). In this letter, we suggest a Branch Direction History (BDH) as the third component of the branch prediction and analyze its impact upon the prediction accuracy. Additionally, we propose a new branch predictor, direction-gshare predictor, which utilizes the BDH combined with the GBH. At first, we model a neural network with (PC, GBH, and BDH) and analyze their actual impact upon the branch prediction accuracy, and then we simulate our new predictor, the direction-gshare predictor. The simulation results show that the aliasing in Pattern History Table (PHT) is significantly reduced by the additional use of BDH information. The direction-gshare predictor outperforms bimodal predictor, two-level adaptive predictor and gshare predictor up to 15.32%, 5.41% and 5.74% respectively, without additional hardware costs.

embedded and ubiquitous computing | 2004

Adaptive Block Management for Victim Cache by Exploiting L1 Cache History Information

Cheol Hong Kim; Jong Wook Kwak; Seong Tae Jhang; Chu Shik Jhon

This paper proposes methods for achieving high energy-delay efficiency in the embedded systems. Particularly, we present adaptive block management schemes for victim cache to reduce the number of accesses to more power consuming memory structures such as L2 caches. Victim cache is a memory element for reducing conflict misses in a direct-mapped L1 cache without affecting its access time. We investigate techniques to use victim cache more efficiently by selecting the blocks to be loaded into it based on the L1 cache history information. According to our simulations, proposed schemes show better performance than the conventional victim cache scheme and also reduce the power consumption.

embedded and ubiquitous computing | 2004

Hybrid Technique for Reducing Energy Consumption in High Performance Embedded Processor

Sunghoon Shim; Cheol Hong Kim; Jong Wook Kwak; Chu Shik Jhon

The cache size tends to grow in the embedded processor as technology scales to smaller transistors and lower supply voltages. However, larger cache size demands more energy. Accordingly, the ratio of the cache energy consumption to the total processor energy is growing. Many cache energy schemes have been proposed for reducing the cache energy consumption. However, these previous schemes are concerned with one side for reducing the cache energy consumption, dynamic cache energy only, or static cache energy only. In this paper, we propose a hybrid scheme for reducing dynamic and static cache energy, simultaneously. For this hybrid scheme, we adopt two existing techniques to reduce static cache energy consumption, drowsy cache technique, and to reduce dynamic cache energy consumption, way–prediction technique. Additionally, we propose a early wakeup technique based on instruction PC to reduce penalty caused by applying these two schemes. We focus on level 1 data cache. Our experimental evaluation shows the total extra cycles due to using drowsy cache scheme can be reduced by 29.6%, on average, through our suggested early wakeup scheme and the ratio of drowsy cache lines is over 87%. The total dynamic energy of the processor can be reduced by 2.2% to 6.8%. Energy-delay about total dynamic processor energy is, on average, reduced by 3% versus a processor using base cache scheme, not using any schemes for energy reduction.

IEEE Transactions on Very Large Scale Integration Systems | 2017

Fast Writeable Block-Aware Cache Update Policy for Spin-Transfer-Torque RAM

Ju Hee Choi; Jong Wook Kwak

Spin-transfer-torque RAM (STT-RAM) is one of the emerging nonvolatile memories for last-level cache (LLC) featuring high density and low leakage. However, long latency for the write operation, which comes from the characteristics of nonvolatility, degrades performance when STT-RAM is employed as LLC. To overcome this problem, we revisit the existing cache update policy and propose a new cache update policy to exploit the asymmetric write characteristics of STT-RAM. In our proposal, the data are written into a fast writeable block, regardless of the original position when the block arrives at the LLC. This paper proves the efficiency of our update policy based on analytical models and gives detailed information for implementing the policy. The experimental results show our scheme reduces slow writes by 77.6%, which leads a 31.1% reduction in write latency.

research in adaptive and convergent systems | 2015

Bypassing method for STT-RAM based inclusive last-level cache

Min Kyu Kim; Ju Hee Choi; Jong Wook Kwak; Seong Tae Jhang; Chu Shik Jhon

Non-volatile memories (NVMs), such as STT-RAM and PCM, have recently become very competitive designs for last-level caches (LLCs). To avoid cache pollution caused by unnecessary write operations, many cache-bypassing methods have been introduced. Among them, SBAC (a statistics-based cache bypassing method for asymmetric-access caches) is the most recent approach for NVMs and shows the lowest cache access latency. However, SBAC only works on non-inclusive caches, so it is not practical with state-of-the-art processors that employ inclusive LLCs. To overcome this limitation, we propose a novel cache scheme, called inclusive bypass tag cache (IBTC) for NVMs. The proposed IBTC with consideration for the characteristics of NVMs is integrated into LLC to maintain coherence of data in the inclusive LLC with a bypass method and the algorithm is introduced to handle the tag information for bypassed blocks with a minimal storage overhead. Experiments show that IBTC cuts down overall energy consumption by 17.4%, and increases the cache hit rate by 5.1%.

Explore More