Hoyoung Yoo
KAIST
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Hoyoung Yoo.
international solid-state circuits conference | 2012
Youngjoo Lee; Hoyoung Yoo; Injae Yoo; In-Cheol Park
Solid-state drives (SSDs), built with many flash memory channels, is usually connected to the host through an advanced high-speed serial interface such as SATA III associated with a transfer rate of 6Gb/s [1-2]. However, the performance of SSD is in general determined by the throughput of the ECC blocks necessary to overcome the high error-rate [3]. The binary BCH code is widely used for the SSD due to its powerful error-correction capability. As it is hard to achieve high-throughput strong BCH decoders [4-5], multiple BCH decoders are typically on a high-performance SSD controller, leading to a significant increase of hardware complexity. This paper presents an efficient BCH encoder/decoder architecture achieving a decoding throughput of 6Gb/s. The overall architecture shown in Fig. 25.3.1 includes a single BCH decoder and a multi-threaded BCH encoder. The single BCH encoder is responsible for all the channels and services a channel at a time in a round-robin manner.
IEEE Transactions on Circuits and Systems Ii-express Briefs | 2015
Hoyoung Yoo; In-Cheol Park
Due to the channel achieving property, the polar code has become one of the most favorable error-correcting codes. As the polar code achieves the property asymptotically, however, it should be long enough to have a good error-correcting performance. Although the previous fully parallel encoder is intuitive and easy to implement, it is not suitable for long polar codes because of the huge hardware complexity required. In this brief, we analyze the encoding process in the viewpoint of very-large-scale integration implementation and propose a new efficient encoder architecture that is adequate for long polar codes and effective in alleviating the hardware complexity. As the proposed encoder allows high-throughput encoding with small hardware complexity, it can be systematically applied to the design of any polar code and to any level of parallelism.
IEEE Transactions on Circuits and Systems Ii-express Briefs | 2011
Youngjoo Lee; Hoyoung Yoo; In-Cheol Park
To achieve a high-throughput decoder, massive-parallel computations are normally applied to the Chien search, but the parallel realization increases the hardware complexity significantly. To reduce the hardware complexity of the parallel Chien search, this brief proposes a 2-D optimization method. In contrast to the previous 1-D optimizations, the proposed method maximizes the sharing of common subexpressions in both the row and column directions. All the partial products needed in the parallel structure are represented in a single matrix, and the finite-field adders are completely eliminated in effect. Simulation results show that the proposed 2-D optimization leads to a significant reduction of the hardware complexity. For the (8191, 7684, 39) BCH code, the count of xor gates in the parallel Chien search is reduced by 92% and 22%, compared to the straightforward and strength-reduced structures, respectively.
IEEE Journal of Solid-state Circuits | 2013
Youngjoo Lee; Hoyoung Yoo; Jaehwan Jung; Jihyuck Jo; In-Cheol Park
To improve the reliability of MLC NAND flash memory, this paper presents an energy-efficient high-throughput architecture for decoding concatenated-BCH (CBCH) codes. As the data read from the flash memory is hard-decided in practical applications, the proposed CBCH decoding method is a promising solution to achieve both high error-correction capability and energy efficiency. In the proposed CBCH decoding, the number of on-chip memory accesses consuming much energy is minimized by computing and updating syndromes two-dimensionally. To achieve an area-efficient hardware realization, row and column decoders are unified into one decoder and some syndromes are computed when they are needed. In addition, the decoding throughput is enhanced remarkably by skipping redundant decoding processes. Based on the proposed CBCH decoding architecture, a prototype chip is implemented in a 65-nm CMOS process to decode the (70528, 65536) CBCH code. The proposed decoder provides a decoding throughput of 17.7 Gb/s and an energy efficiency of 2.74 pJ/bit, being vastly superior to the state-of-the-art architectures.
IEEE Transactions on Circuits and Systems Ii-express Briefs | 2013
Hoyoung Yoo; Jaehwan Jung; Jihyuck Jo; In-Cheol Park
This brief presents a new area-efficient multimode encoder for long Bose-Chaudhuri-Hocquenghen codes. In the proposed multimode encoding architecture, several short linear-feedback shift registers (LFSRs) are cascaded in series to achieve the same functionality that a long LFSR has, and the output of a short LFSR is fed back to the input side to support multimode encoding. Whereas previous multimode architectures necessitate huge overhead due to preprocessing and postprocessing, the proposed architecture completely eliminates the overhead by exploiting an efficient transformation. Without sacrificing the latency, the proposed architecture reduces hardware complexity by up to 97.2% and 49.1% compared with the previous Chinese-remainder-theorem-based and weighted-summation-based multimode architectures, respectively.
asia and south pacific design automation conference | 2014
Hoyoung Yoo; Youngjoo Lee; In-Cheol Park
This paper presents a universal BCH encoder and decoder that can support multiple error-correction capabilities. A novel encoding architecture and on-demand syndrome calculation technique is proposed to reduce both hardware complexity and power consumption. Based on the proposed methods, 32-parallel universal encoder and decoder are designed for BCH (8192+14t, 8192, t) codes, where the error-correction capability t is configurable to 8, 11, 16, 24, 32, and 64. The prototype chip achieves a throughput of 7.3 Gb/s and occupies 2.24 mm2 in 0.13μm CMOS technology.
international soc design conference | 2012
Hoyoung Yoo; Youngjoo Lee; In-Cheol Park
The shortened RS code is traditionally decoded based on the standard decoding process by padding zero symbols. As additional cycles are redundantly taken to deal with the zero symbols, the processing latency of the shortened code is almost the same as that of the mother RS code from which the shortened code is derived. A new architecture is proposed in this paper to decrease the processing latency to the codeword length of the shortened RS code, which can be implemented at the cost of small additional hardware resources. The additional hardware complexity is minimized by reutilizing the hardware resources resident in the adjacent block. Experimental results show that the proposed method leads to a significant reduction of the overall latency. For the RS (32, 24) code, the overall processing latency is reduced by 85.2% and 33.6% compared to the conventional and the previous work, respectively. Moreover, the additional hardware complexity of the proposed method is smaller than those of the previous architectures.
IEEE Transactions on Very Large Scale Integration Systems | 2014
Youngjoo Lee; Hoyoung Yoo; Injae Yoo; In-Cheol Park
This paper presents a high-throughput and low-complexity BCH decoder for NAND flash memory applications, which is developed to achieve a high data rate demanded in the recent serial interface standards. To reduce the decoding latency, a data sequence read from a flash memory channel is re-encoded by using the encoder that is idle at that time. In addition, several optimizing methods are proposed to relax the hardware complexity of a massive-parallel BCH decoder and increase the operating frequency. In a 130-nm CMOS process, a (8640, 8192, 32) BCH decoder designed as a prototype provides a decoding throughput of 6.4 Gb/s while occupying an area of 0.85 mm2.
international conference on acoustics, speech, and signal processing | 2012
Youngjoo Lee; Hoyoung Yoo; In-Cheol Park
This paper presents a new optimization method to reduce the hardware complexity of syndrome calculation in strong BCH decoding. All the operations required in the parallel syndrome calculation are reformulated as a single matrix computation to enlarge the search area for common sub-expressions. The computational complexity of syndrome calculation is significantly reduced by finding and sharing common terms in the single matrix computation. Implementation results show that the proposed architecture saves 55% of area overheads compared to the conventional structure.
IEEE Transactions on Very Large Scale Integration Systems | 2016
Jihyuck Jo; Hoyoung Yoo; In-Cheol Park
This brief presents an energy-efficient architecture to extract mel-frequency cepstrum coefficients (MFCCs) for real-time speech recognition systems. Based on the algorithmic property of MFCC feature extraction, the architecture is designed with floating-point arithmetic units to cover a wide dynamic range with a small bit-width. Moreover, various operations required in the MFCC extraction are examined to optimize operational bit-width and lookup tables needed to compute nonlinear functions, such as trigonometric and logarithmic functions. In addition, the dataflow of MFCC extraction is tailored to minimize the computation time. As a result, the energy consumption is considerably reduced compared with previous MFCC extraction systems.