Wongyu Shin | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Wongyu Shin is active.

Explore More

Publication

Featured researches published by Wongyu Shin.

high-performance computer architecture | 2014

NUAT: A non-uniform access time memory controller

Wongyu Shin; Jeongmin Yang; Jungwhan Choi; Lee-Sup Kim

With rapid development of micro-processors, off-chip memory access becomes a system bottleneck. DRAM, a main memory in most computers, has concentrated only on capacity and bandwidth for decades to achieve high performance computing. However, DRAM access latency should also be considered to keep the development trend in multi-core era. Therefore, we propose NUAT which is a new memory controller focusing on reducing memory access latency without any modification of the existing DRAM structure. We only exploit DRAMs intrinsic phenomenon: electric charge variation in DRAM cell capacitors. Given the cost-sensitive DRAM market, it is a big advantage in terms of actual implementation. NUAT gives a score to every memory access request and the request with the highest score obtains a priority. For scoring, we introduce two new concepts: Partitioned Bank Rotation (PBR) and PBR Page Mode (PPM). First, PBR is a mechanism that draws information of access speed from refresh timing and position; the request which has faster access speed gains higher score. Second, PPM selects a better page mode between open- and close-page modes based on the information from PBR. Evaluations show that NUAT decreases memory access latency significantly for various environments.

international symposium on computer architecture | 2015

Multiple clone row DRAM: a low latency and area optimized DRAM

Jungwhan Choi; Wongyu Shin; Jaemin Jang; Jinwoong Suh; Yongkee Kwon; Youngsuk Moon; Lee-Sup Kim

Several previous works have changed DRAM bank structure to reduce memory access latency and have shown performance improvement. However, changes in the area-optimized DRAM bank can incur large area-overhead. To solve this problem, we propose Multiple Clone Row DRAM (MCR-DRAM), which uses existing DRAM bank structure without any modification.

international solid-state circuits conference | 2013

All-digital hybrid temperature sensor network for dense thermal monitoring

Seungwook Paek; Wongyu Shin; Jae-Young Lee; Hyo-Eun Kim; Jun-Seok Park; Lee-Sup Kim

Technology scaling and many-core design trends demand detailed information regarding the spatial temperature distribution, which is essential for dynamic thermal management [1,2]. The number of on-chip temperature sensors in high-performance processors is increasing, with state-of-the-art commercial processors embedding up to 44 on-chip sensors [3] and the number is likely to increase in the future (Fig. 14.7.1(a)). We observe two significant challenges in on-chip temperature sensing: 1) the increasing number of sensors, and 2) placing them in a regular manner (not solely on the potential hotspots). The number of sensors is mostly constrained by their area. Indeed, the sensor area is difficult to shrink since large delay lines or a BJT with a large ADC, and digital circuits are required to generate a proportional-to-absolute-temperature (PTAT) signal [2,5,6]. Many-core processor architectures give rise to the second challenge, namely, the hotspot locations within many-core processors are difficult to predict since we cannot determine the task allocation (and heat) profile at design time [2]. Consequently, an area-efficient dense thermal monitoring technique is desirable for next-generation processors.

IEEE Transactions on Computers | 2016

Q-DRAM: Quick-Access DRAM with Decoupled Restoring from Row-Activation

Wongyu Shin; Jungwhan Choi; Jaemin Jang; Jinwoong Suh; Yongkee Kwon; Youngsuk Moon; Hong-Sik Kim; Lee-Sup Kim

The relatively high latency of DRAM is mostly caused by the long row-activation time which in fact consists of sensing and restoring time. Memory controllers cannot distinguish between them since they are performed consecutively by a single row-activation command. If these two steps are separated, the restoring can be delayed until DRAM access is uncongested. Hence, we propose Quick-Access DRAM (Q-DRAM) which discriminates between sensing and restoring. Our approach is to allow destructive access (i.e., only sensing is performed without restoring by a row-activation command) using per-bank multiple row-buffers. We call the destructive access and per-bank multiple row-buffers quick-access and quick-buffers (q-buffers) respectively. In addition, we propose Quick-access Trigger (Q-TRIGGER) and RESTORER to utilize Q-DRAM. Q-TRIGGER makes a decision whether quick-access is required or not, and RESTORER decides when to restore the data at the destructed cell. Specifically, RESTORER detects the proper timing to hide restoring time by predicting data bus occupation and by exploiting bank-level locality. Evaluations show that Q-DRAM significantly improved performance for both single- and multi-core systems.

IEEE Journal of Solid-state Circuits | 2015

Hybrid Temperature Sensor Network for Area-Efficient On-Chip Thermal Map Sensing

Seungwook Paek; Wongyu Shin; Jae-Young Lee; Hyo-Eun Kim; Jun-Seok Park; Lee-Sup Kim

Spatial thermal distribution of a chip is an essential information for dynamic thermal management. To get a rich thermal map, the sensor area is required to be reduced radically. However, squeezing the sensor size is about to face its physical limitation. In this background, we propose an area-efficient thermal sensing technique: hybrid temperature sensor network. The proposed sensor architecture fully exploits the spatial low-pass filtering effect of thermal systems, which implies that most of the thermal information resides in very low spatial frequency region. Our on-chip sensor network consists of a small number of accurate thermal sensors and a large number of tiny relative thermal sensors, responsible for low and high spatial frequency thermal information respectively. By combining these sensor readouts, a thermal map upsampler synthesizes a higher spatial resolution thermal map with a proposed guided upsampling algorithm.

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems | 2013

PowerField: A Probabilistic Approach for Temperature-to-Power Conversion Based on Markov Random Field Theory

Seungwook Paek; Wongyu Shin; Jaehyeong Sim; Lee-Sup Kim

Temperature-to-power technique is useful for post-silicon power model validation. However, the previous works were applicable only to the steady-state analysis. In this paper, we propose a new temperature-to-power technique, named PowerField, supporting both transient and steady-state analysis based on a probabilistic approach. Unlike the previous works, PowerField uses two consecutive thermal images to find the most feasible power distribution that causes the change between the two input images. To obtain the power map with the highest probability, we adopted maximum a posteriori Markov random field (MAP-MRF). For MAP-MRF framework, we modeled the spatial thermal system as a set of thermal nodes and derived an approximated transient heat transfer equation that requires only the local information of each thermal node. Experimental results with a thermal simulator show that PowerField outperforms the previous method in transient analysis reducing the error by half on average. We also show that our framework works well for steady-state analysis by using two identical steady-state thermal maps as inputs. Lastly, an application to determining the binary power patterns of an FPGA device is presented achieving 90.7% average accuracy.

international symposium on computer architecture | 2016

Energy efficient data encoding in DRAM channels exploiting data value similarity

Hoseok Seol; Wongyu Shin; Jaemin Jang; Jungwhan Choi; Jinwoong Suh; Lee-Sup Kim

As DRAM data bandwidth increases, tremendous energy is dissipated in the DRAM data bus. To reduce the energy consumed in the data bus, DRAM interfaces with symmetric termination, such as Pseudo Open Drain (POD) and Low Voltage Swing Terminated Logic (LVSTL), have been adopted in modern DRAMs. In interfaces using asymmetric termination, the amount of termination energy is proportional to the hamming weight of the data words. In this work, we propose Bitwise Difference Encoding (BD-Encoding), which decreases the hamming weight of data words, leading to a reduction in energy consumption in the modern DRAM data bus. Since smaller hamming weight of the data words also reduces switching activity, switching energy and power noise are also both reduced. BD-Encoding exploits the similarity in data words in the DRAM data bus. We observed that similar data words (i.e. data words whose hamming distance is small) are highly likely to be sent over at similar times. Based on this observation, BD-coder stores the data recently sent over in both the memory controller and DRAMs. Then, BD-coder transfers the bitwise difference between the current data and the most similar data. In an evaluation using SPEC 2006, BD-Encoding using 64 recent data reduced termination energy by 58.3% and switching energy by 45.3%. In addition, 55% of the LdI/dt noise was decreased with BD-Encoding.

IEEE Transactions on Computers | 2016

DRAM-Latency Optimization Inspired by Relationship between Row-Access Time and Refresh Timing

Wongyu Shin; Jungwhan Choi; Jaemin Jang; Jinwoong Suh; Youngsuk Moon; Yongkee Kwon; Lee-Sup Kim

It is widely known that relatively long DRAM latency forms a bottleneck in computing systems. However, DRAM vendors are strongly reluctant to decrease DRAM latency due to the additional manufacturing cost. Therefore, we set our goal to reduce DRAM latency without any modification in the existing DRAM structure. To accomplish our goal, we focus on an intrinsic phenomenon in DRAM: electric charge variation in DRAM cell capacitors. Then, we draw two key insights: i) DRAM row-access latency of a row is a function of the elapsed time from when the row was last refreshed, and ii) DRAM row-access latency of a row is also a function of the remaining time until the row is next refreshed. Based on these two insights, we propose two mechanisms to reduce DRAM latency: NUAT-1 and NUAT-2. NUAT-1 exploits the first key insight and NUAT-2 exploits the second key insight. For evaluation, circuit- and system-level simulations are performed, which show the performance improvement for various environments.

IEEE Transactions on Very Large Scale Integration Systems | 2017

In-DRAM Data Initialization

Hoseok Seol; Wongyu Shin; Jaemin Jang; Jungwhan Choi; Jinwoong Suh; Lee-Sup Kim

Initializing memory with zero data is essential for safe memory management. However, initializing a large memory area slows down the system significantly. The most likely cause for initialization to slow down the system is the limited DRAM initialization method. At present, the only way to initialize DRAM area is to execute multiple WRITE commands. However, the WRITE command slows the initialization because of its small granularity and data bus occupancy. In this brief, we propose an efficient in-DRAM initialization method inspired by the internal structure and operation of DRAM. The proposed method, called row reset, uses a DRAM row buffer to zero out a single DRAM row at a time. Row Reset allows for parallel initialization on multiple DRAM banks without using off-chip data transfer, thus reducing initialization time by up to 63 times. Row reset is a practical approach, because it can be implemented with existing circuitry in DRAM without additional area overhead.

IEEE Transactions on Computers | 2017

Rank-Level Parallelism in DRAM

Wongyu Shin; Jaemin Jang; Jungwhan Choi; Jinwoong Suh; Yongkee Kwon; Youngsuk Moon; Lee-Sup Kim

DRAM systems are hierarchically organized: Channel-Rank-Bank. A channel is connected to multiple ranks, and each rank has multiple banks. This hierarchical structure facilitates creating parallelisms in DRAM. The current DRAM architecture supports bank-level parallelism; as many rows as banks can be moved simultaneously at bank-level. However, rank-level parallelism is not supported. For this reason, only one column can be accessed at a time, although each rank has its own data bus that can carry a column. Namely, current DRAM operations do not exploit the structural opportunity created by multiple ranks. We, therefore, propose a novel DRAM architecture supporting rank-level parallelism. Thereby, as many columns as ranks can be moved concurrently at rank-level. In this paper, we illustrate the rank-level parallelism and its benefit in DRAM operations.

Explore More