Kon-Woo Kwon
Purdue University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Kon-Woo Kwon.
IEEE Electron Device Letters | 2014
Kon-Woo Kwon; Sri Harsha Choday; Yusung Kim; Xuanyao Fong; Sang Phill Park; Kaushik Roy
A novel nonvolatile flip-flop (NVFF) using a magnetic tunnel junction (MTJ) is presented for power gating architecture. The proposed NVFF exploits spin Hall effect (SHE) for fast and low-power data backup into MTJs before the power is gated off. Owing to the high spin injection efficiency of SHE, the estimated write current for backup operation is lower than 40 μA. Due to the low write current requirement, we do not introduce a dedicated write driver circuit. Instead, we utilize the cross-coupled inverters in the slave latch to perform the backup operation, resulting in low area overhead. The simulation results show 10× improvement in backup energy when compared with previous works on spin transfer torque-based NVFFs.
IEEE Transactions on Very Large Scale Integration Systems | 2014
Kon-Woo Kwon; Sri Harsha Choday; Yusung Kim; Kaushik Roy
Spin-transfer torque magnetic RAM (STT-MRAM) is a promising memory technology for lower level caches because of its high density and nonvolatile nature. However, the high write latency is a bottleneck to its widespread adoption as the future on-chip memory. In this paper, we propose a new cache architecture-asymmetric write architecture with redundant blocks (AWARE)-that can improve the write latency by taking advantage of the asymmetric write characteristics of 1T-1MTJ STT-MRAM bit-cells. Due to the nature of the storage element in STT-MRAM, the time required for the two-state transitions ( 1→ 0 and 0→ 1) is not identical. In other words, one of the state transitions is slower than the other direction. In conventional cache architecture, the overall write latency is limited by the slower transition. However, the AWARE cache design introduces redundant blocks in each row, and they are preset to the initial state that enables the faster transition. Hence the write operations performed in these redundant blocks are much faster than the conventional write scheme. The write latency in AWARE is improved by 30% over conventional cache architecture with no area penalty in the data array. Moreover, the additional tag bits introduced in this technique result in penalty on the total cache area. In addition, the write energy increases modestly by 7% in the proposed cache design. However, this write-energy increase can be mitigated by sacrificing the cache capacity.
IEEE Transactions on Electron Devices | 2015
Yusung Kim; Xuanyao Fong; Kon-Woo Kwon; Mei-Chin Chen; Kaushik Roy
In this paper, we present two multilevel spin-orbit torque magnetic random access memories (SOT-MRAMs). A single-level SOT-MRAM employs a three-terminal SOT device as a storage element with enhanced endurance, close-to-zero read disturbance, and low write energy. However, the three-terminal device requires the use of two access transistors per cell. To improve the integration density, we propose two multilevel cells (MLCs): 1) series SOT MLC and 2) parallel SOT MLC, both of which store two bits per memory cell. A detailed analysis of the bit-cell suggests that the S-MLC is promising for applications requiring both high density and low write-error rate, and P-MLC is particularly suitable for high-density and low-write-energy applications. We also performed iso-bit-cell area comparison of our MLC designs with previously proposed MLCs that are based on spin-transfer torque MRAM and show 3-16× improvement in write energy.
IEEE Transactions on Nanotechnology | 2015
Kon-Woo Kwon; Xuanyao Fong; Parami Wijesinghe; Priyadarshini Panda; Kaushik Roy
In spin-transfer torque magnetic random access memory (STT-MRAM), retention-, write-, and read-failures negatively impact the memory yield and density. In this paper, we jointly consider device-circuit-architecture layers to implement high-density STT-MRAM array while meeting the target yield requirement. Different types of magnetic tunnel junctions are considered at the device level, and error correcting codes (ECCs) in conjunction with invert-coding are employed as an architectural solution. Through cross-layer interactions, we present a design methodology to optimize bit-cell area while satisfying the target yield and energy consumption under process variation. Furthermore, we explore the use of invert-coding along with ECC in order to achieve higher memory density than that obtained using ECC alone. Our proposed technique can improve memory density further by proper selection of thermal stability factor based upon two observations: 1) invert-coding can fix multiple write/read failures with small storage overhead and 2) as thermal stability factor increases, retention-failure probability exponentially decreases, and thus, simple ECC is good enough for retention failure correction.
international symposium on quality electronic design | 2013
Mrigank Sharad; Karthik Yogendra; Kon-Woo Kwon; Kaushik Roy
All Spin Logic (ASL) employs multiple nano-magnets interacting through spin-torque using metallic interconnect. ASL gates, being magneto-metallic, can operate at ultra low terminal voltage of few millivolts, and hence can be exploited for low power computation. Since, nano-magnets can preserve their state upon withdrawal of supply voltage, ASL can be pipelined for higher performance, without insertion of extra latches. However, pipelining requires the use of clocked CMOS transistors, which significantly increase the required supply voltage. In this work we analyse the design of an 8-bit, pipelined ASL multiplier, integrated with CMOS clocking circuitry. We propose a design scheme for 3-D ASL, which involves stacking of multiple ASL layers that are clocked using the same CMOS transistors. Stacking of N ASL layers using the proposed scheme can enhance the power saving as well as area density by factor of N. The proposed design scheme for magneto-metallic computational blocks can achieve more than two order of magnitude higher density and 10x lower power consumption as compared to 15nm CMOS design.
IEEE Magnetics Letters | 2015
Yeongkyo Seo; Xuanyao Fong; Kon-Woo Kwon; Kaushik Roy
This paper proposes two types of dual-ported (1-read/1-write: 1R/1W) spin-Hall magnetic random-access memory (SH-MRAM) suitable for on-chip cache applications. Separate write and read ports allow simultaneous write and read access, which can mitigate the impact of slow write latency on the system performance without any area overhead compared to the single-ported SH-MRAM. The efficient spin-Hall effect based spin-transfer torque (STT) leads to low-power write operation. In addition, separate read and write current paths of the devices can enhance the read operation without much impact on the write operation. The differential sensing scheme in 1R/1W differential SH-MRAM can further improve sensing power and sensing margin. Compared to the 1R/1W STT-MRAM bit cell, the 1R/1W SH-MRAM bit cell can achieve lower power consumption for write operation and higher sensing margin with low read power consumption under an iso-area condition.
IEEE Journal on Emerging and Selected Topics in Circuits and Systems | 2016
Yeongkyo Seo; Kon-Woo Kwon; Xuanyao Fong; Kaushik Roy
This paper proposes a dual (1R/1W) port spin-orbit torque magnetic random access memory (1R/1W SOT-MRAM) for energy efficient on-chip cache applications. Our proposed dual port memory can alleviate the impact of write latency on system performance by supporting simultaneous read and write accesses. The spin-orbit device leverages the high spin current injection efficiency of spin Hall metal to achieve low critical switching current to program a magnetic tunnel junction. The low write current reduces the write power consumption, and the size of the access transistors, leading to higher integration density. Furthermore, the decoupled read and write current paths of the spin-orbit device improves oxide barrier reliability, because the write current does not flow through the oxide barrier. Device, circuit, and system level co-simulations show that a 1R/1W SOT-MRAM based L2 cache can improve the performance and energy-efficiency of the computing systems compared to SRAM and standard STT-MRAM based L2 caches.
IEEE Electron Device Letters | 2016
Yeongkyo Seo; Kon-Woo Kwon; Kaushik Roy
This letter presents a spin-orbit torque magnetic random access memory (SOT-MRAM) for high-density, reliable, and energy-efficient on-chip memory application. Unlike the conventional SOT-MRAM requiring two access transistors, the proposed MRAM uses only one access transistor along with a Schottky diode in order to achieve high integration density while maintaining the advantages of SOT-MRAM, such as low write energy and enhanced reliability of magnetic tunnel junction. The Schottky diode is forward-biased during read, whereas it is reverse-biased during write to prevent sneak current paths. Our simulation results show that the proposed MRAM can achieve 30% and 50% reduction in bit-cell area in comparison to conventional spin-transfer torque MRAM (STT-MRAM) and SOT-MRAM, respectively, and ~2.5× improvement in write power compared with the STT-MRAM.
international conference on electronics, circuits, and systems | 2008
Kyusam Lim; Kon-Woo Kwon; Hee Ju Park; Jeong Hun Kim; Suki Kim; Jun-Jea Sung; Kwang-Hyun Baek
This paper describes an Infrared wireless optical mouse system and a dual-band Infrared receiver. The presented system is designed with dual-band Infrared receiver. The proposed dual-band Infrared receiver is designed to receive two kinds of signals that are modulated by two different carrier frequencies. One is 37.9 kHz carrier frequency for conventional remote controls and the other is 113.7 kHz carrier frequency for enhanced data rate communications. The proposed dual-band Infrared receiver chip was fabricated in a 0.5 um CMOS process. The presented system employs 113.7 kHz carrier frequency to enhance the data-rate and to make immune against ambient Infrared noise for the optical communication and it was designed to FPGA device. The presented system is tested in a standard PC system as a HID compliant mouse and it transfers data at average rate of 2.7 kb/s via optical link.
international conference on computer aided design | 2014
Sri Harsha Choday; Kon-Woo Kwon; Kaushik Roy
The recent advances in thin-film thermoelectric (TE) materials have created opportunities for on-chip cooling and energy-harvesting with heat-fluxes >100W/cm2. However, it remains unclear how effective these materials are in the context of realistic microprocessor floorplan and workloads. Moreover, these TE materials suffer from contact parasitics that can significantly impact their performance. To evaluate the workload dependent performance of on-chip TE devices, we developed a hierarchical simulation methodology that connects an architectural simulator and a power estimation tool with a thermal simulator capable of simulating TE devices. The well-known HotSpot thermal simulator is modified to incorporate TE equations along with contact parasitics in the TE module. SimpleScalar and McPAT were used to generate the runtime power of different functional units in an Out-of-Order processor across the SPEC2000 workloads. The power-map generated by McPAT is used by our TE enhanced HotSpot simulator to evaluate the cooling and harvesting capabilities of on-chip TE modules. Our results indicate that it is possible to obtain 11°C peak cooling at the hot-spots, or harvest upto 85mW of power from the hot-spots. We also show that on-chip TE devices can aid in boosting the clock frequency of the processor from 1200MHz to 1600MHz under iso-temperature comparison with the no-TE case. This framework also allows for the rapid design space exploration of TE modules material/physical parameters and the optimum placement options for the TE module on the chip floorplan.