Yeong-Luh Ueng
National Tsing Hua University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Yeong-Luh Ueng.
IEEE Transactions on Circuits and Systems | 2012
Yeong-Luh Ueng; Chen-Yap Leong; Chung-Jay Yang; Chung-Chao Cheng; Shu-Wei Chen
Compared to binary low-density parity-check (LDPC) codes, nonbinary LDPC codes have better error performance when the code length is moderate. This paper presents an efficient layered decoder architecture for nonbinary quasi-cyclic (QC) LDPC codes using the proposed barrel-shifter-based permutation network and minimum value filter which is used to determine the first few smallest values from a given set. Through the permutation network, the decoding operations related to the multiplications over finite fields can be efficiently handled in the check-node operations, which simplifies the permutations in the variable-node operations and, hence, enables the layered decoder to be realized efficiently. In order to increase the throughput, we utilize the proposed permutation network and the minimum value filter to devise a selective-input min-max decoder architecture. Using a 90-nm CMOS process, we implemented three nonbinary decoders to demonstrate the proposed techniques.
IEEE Transactions on Circuits and Systems | 2013
Yeong-Luh Ueng; Bo-Jhang Yang; Chung-Jay Yang; Huang-Chang Lee; Jeng-Da Yang
This paper presents an efficient multi-standard low-density parity-check (LDPC) decoder architecture using a shuffled decoding algorithm, where variable nodes are divided into several groups. In order to provide sufficient memory bandwidth without the need for using registers, a FIFO-based check-mode memory, which dominates the decoder area, is used. Since two compensation factors, rather than a single factor, are dynamically used in the offset Min-Sum algorithm, the number of quantization bits, and, hence, the memory size, can be reduced without degradation in error performance. In order to further reduce the memory size, artificial minimum values, which do not need to be stored in memory, are used. We also propose an algorithm that can be used to partition variable nodes such that the hardware cost can be minimized. Using the proposed techniques, a multi-standard decoder that supports the LDPC codes specified in the ITU G.hn, IEEE 802.11n, and IEEE 802.16e standards was designed and implemented using a 90-nm CMOS process. This decoder supports 133 codes, occupies an area of 5.529 mm2 , and achieves an information throughput of 1.956 Gbps.
IEEE Transactions on Circuits and Systems | 2011
Yu-Luen Wang; Yeong-Luh Ueng; Chien-Lien Peng; Chung-Jay Yang
In this paper, we propose dividing the decoding operations of a variety of irregular quasi-cyclic (QC) low-density parity-check (LDPC) codes into several smaller tasks. An algorithm is devised in order to arrange these tasks in a similar form such that a highly reusable multimode architecture can be designed to process these tasks. For this task-based decoder, the associated memory access can be accomplished with the help of the proposed address generators. Using this approach, the difficulty of designing a low-complexity multimode decoder, which is capable of supporting a variety of irregular QC-LDPC codes, can be overcome. Layered encoding that enables the routing networks and memory for decoding to be reused for the encoding is also proposed. Using these techniques, a multimode codec which can support all 114 WiMAX LDPC codes is designed and implemented in a 90-nm process. The full-mode WiMAX codec achieves a moderate encoding (decoding) throughput of 800 Mb/s (200 Mb/s) and occupies an area of only 0.679 mm2.
IEEE Transactions on Circuits and Systems | 2010
Yeong-Luh Ueng; Chung-Jay Yang; Kuan-Chieh Wang; Chun-Jung Chen
For an efficient multimode low-density parity-check (LDPC) decoder, most hardware resources, such as permutators, should be shared among different modes. Although an LDPC code constructed based on a Reed-Solomon (RS) code with two information symbols is not quasi-cyclic, in this paper, we reveal that the structural properties inherent in its parity-check matrix can be adopted in the design of configurable permutators. A partially parallel architecture combined with the proposed permutators is used to mitigate the increase in implementation complexity for the multimode function. The high check-node degree of a high-rate RS-LDPC code leads to challenges in the efficient implementation of a high-throughput decoder. To overcome this difficulty, the variable nodes have been partitioned into several groups, and each group is processed sequentially in order to shorten the critical-path delay and hence increase the maximum operating frequency. In addition, shuffled message-passing decoding is adopted, since fewer iterations can be used to achieve the desired bit-error-rate performance. In order to demonstrate the usefulness of the proposed flexible-permutator-based architecture, one single-mode rate-0.84 decoder and two multimode decoders whose code rates range between 0.79 and 0.93 have been implemented. These decoders can achieve multigigabit-per-second throughput. Using the proposed architecture to support lower rate RS-LDPC codes, e.g., rate-0.568 code, is also investigated.
vehicular technology conference | 2007
Yeong-Luh Ueng; Chung-Chao Cheng
In this paper, we propose a modified iterative decoding algorithm to decode a special class of quasi-cyclic low- density parity-check (QC-LDPC) codes such as QC-LDPC codes used in the IEEE 802.16e standards. The proposed decoding is implemented by serially decoding block codes with identical parity-check matrix H1 derived from the parity-check matrix H of the QC-LDPC codes. The dimensions of H1 are much smaller than those of H. Extrinsic values can be passed among these block codes since the code bits of these block codes are overlapped. Hence, the proposed decoding can reduce the number of iterations required by up to forty percent without error performance loss as compared to the conventional message- passing decoding algorithm. A partially-parallel very large-scale integration (VLSI) architecture is proposed to implement such a decoding algorithm. The proposed VLSI decoder can fully take advantage of the proposed decoding to increase its throughput. In addition, the proposed decoder only needs to store check-to- variable messages and hence is memory efficient.
IEEE Transactions on Signal Processing | 2013
Yeong-Luh Ueng; Hsueh-Chih Chou; Chung-Jay Yang
This paper presents a high-throughput decoder architecture for non-binary low-density parity-check (LDPC) codes, where the
IEEE Transactions on Communications | 2013
Huang-Chang Lee; Yeong-Luh Ueng; Shan-Ming Yeh; Wen-Yen Weng
q
IEEE Journal on Selected Areas in Communications | 2009
Yeong-Luh Ueng; Chia-Jung Yeh; Mao-Chao Lin; Chung-Li Wang
-ary sum-product algorithm (QSPA) in the log domain is considered. We reformulate the check-node processing such that an efficient trellis-based implementation can be used, where forward and backward recursions are involved. In order to increase the decoding throughput, bidirectional forward-backward recursion is used. In addition, layered decoding is adopted to reduce the number of iterations based on a given performance. Finally, a message compression technique is used to reduce the storage requirements and hence the area. Using a 90-nm CMOS process, a 32-ary (837,726) LDPC decoder was implemented to demonstrate the proposed techniques and architecture. This decoder can achieve a throughput of 233.53 Mb/s at a clock frequency of 250 MHz based on the post-layout results. Compared to the decoders presented in previous literature, the proposed decoder can achieve the highest throughput based on a similar/better error-rate performance.
IEEE Transactions on Circuits and Systems | 2014
Chung-Chao Cheng; Jeng-Da Yang; Huang-Chang Lee; Chia-Hsiang Yang; Yeong-Luh Ueng
When residual belief-propagation (RBP), which is a kind of informed dynamic scheduling (IDS), is applied to low-density parity-check (LDPC) codes, the convergence speed in error-rate performance can be significantly improved. However, the RBP decoders presented in previous literature suffer from poor convergence error-rate performance due to the two phenomena explored in this paper. The first is the greedy-group phenomenon, which results in a small part of the decoding graph occupying most of the decoding resources. By limiting the number of updates for each edge message in the decoding graph, the proposed Quota-based RBP (Q-RBP) schedule can reduce the probability of greedy groups forming. The other phenomenon is the silent-variable-nodes issue, which is a condition where some variable nodes have no chance of contributing their intrinsic messages to the decoding process. As a result, we propose the Silent-Variable-Node-Free RBP (SVNF-RBP) schedule, which can force all variable nodes to contribute their intrinsic messages to the decoding process equally. Both the Q-RBP and the SVNF-RBP provide appealing convergence speed and convergence error-rate performance compared to previous IDS decoders for both dedicated and punctured LDPC codes.
IEEE Transactions on Communications | 2014
Huang-Chang Lee; Yeong-Luh Ueng
For a turbo coded BLAST (Bell LAbs Space-Time architecture) system with Nt transmit antennas and Nr receive antennas, there is a significant gap between its detection threshold and the capacity in case Nt > Nr. In this paper, we show that by introducing a convolutional interleaver with block delay between the BLAST mapper and the turbo encoder, the threshold can be improved. Near-capacity thresholds can be achieved for some cases. To take advantage of the low detector complexity in Alamouti STBC (space-time block code), we also investigate a STBC system, which is the concatenation of the Alamouti STBC with a turbo trellis coded modulation. By using a proper labelling and adding a convolutional interleaver with block delay to such a STBC system, we achieve both lower error floors and lower thresholds.