Yeong-Luh Ueng | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Yeong-Luh Ueng is active.

Explore More

Publication

Featured researches published by Yeong-Luh Ueng.

IEEE Transactions on Circuits and Systems | 2012

An Efficient Layered Decoding Architecture for Nonbinary QC-LDPC Codes

Yeong-Luh Ueng; Chen-Yap Leong; Chung-Jay Yang; Chung-Chao Cheng; Shu-Wei Chen

Compared to binary low-density parity-check (LDPC) codes, nonbinary LDPC codes have better error performance when the code length is moderate. This paper presents an efficient layered decoder architecture for nonbinary quasi-cyclic (QC) LDPC codes using the proposed barrel-shifter-based permutation network and minimum value filter which is used to determine the first few smallest values from a given set. Through the permutation network, the decoding operations related to the multiplications over finite fields can be efficiently handled in the check-node operations, which simplifies the permutations in the variable-node operations and, hence, enables the layered decoder to be realized efficiently. In order to increase the throughput, we utilize the proposed permutation network and the minimum value filter to devise a selective-input min-max decoder architecture. Using a 90-nm CMOS process, we implemented three nonbinary decoders to demonstrate the proposed techniques.

IEEE Transactions on Circuits and Systems | 2013

An Efficient Multi-Standard LDPC Decoder Design Using Hardware-Friendly Shuffled Decoding

Yeong-Luh Ueng; Bo-Jhang Yang; Chung-Jay Yang; Huang-Chang Lee; Jeng-Da Yang

This paper presents an efficient multi-standard low-density parity-check (LDPC) decoder architecture using a shuffled decoding algorithm, where variable nodes are divided into several groups. In order to provide sufficient memory bandwidth without the need for using registers, a FIFO-based check-mode memory, which dominates the decoder area, is used. Since two compensation factors, rather than a single factor, are dynamically used in the offset Min-Sum algorithm, the number of quantization bits, and, hence, the memory size, can be reduced without degradation in error performance. In order to further reduce the memory size, artificial minimum values, which do not need to be stored in memory, are used. We also propose an algorithm that can be used to partition variable nodes such that the hardware cost can be minimized. Using the proposed techniques, a multi-standard decoder that supports the LDPC codes specified in the ITU G.hn, IEEE 802.11n, and IEEE 802.16e standards was designed and implemented using a 90-nm CMOS process. This decoder supports 133 codes, occupies an area of 5.529 mm2 , and achieves an information throughput of 1.956 Gbps.

IEEE Transactions on Circuits and Systems | 2011

Processing-Task Arrangement for a Low-Complexity Full-Mode WiMAX LDPC Codec

Yu-Luen Wang; Yeong-Luh Ueng; Chien-Lien Peng; Chung-Jay Yang

In this paper, we propose dividing the decoding operations of a variety of irregular quasi-cyclic (QC) low-density parity-check (LDPC) codes into several smaller tasks. An algorithm is devised in order to arrange these tasks in a similar form such that a highly reusable multimode architecture can be designed to process these tasks. For this task-based decoder, the associated memory access can be accomplished with the help of the proposed address generators. Using this approach, the difficulty of designing a low-complexity multimode decoder, which is capable of supporting a variety of irregular QC-LDPC codes, can be overcome. Layered encoding that enables the routing networks and memory for decoding to be reused for the encoding is also proposed. Using these techniques, a multimode codec which can support all 114 WiMAX LDPC codes is designed and implemented in a 90-nm process. The full-mode WiMAX codec achieves a moderate encoding (decoding) throughput of 800 Mb/s (200 Mb/s) and occupies an area of only 0.679 mm2.

IEEE Transactions on Circuits and Systems | 2010

A Multimode Shuffled Iterative Decoder Architecture for High-Rate RS-LDPC Codes

Yeong-Luh Ueng; Chung-Jay Yang; Kuan-Chieh Wang; Chun-Jung Chen

For an efficient multimode low-density parity-check (LDPC) decoder, most hardware resources, such as permutators, should be shared among different modes. Although an LDPC code constructed based on a Reed-Solomon (RS) code with two information symbols is not quasi-cyclic, in this paper, we reveal that the structural properties inherent in its parity-check matrix can be adopted in the design of configurable permutators. A partially parallel architecture combined with the proposed permutators is used to mitigate the increase in implementation complexity for the multimode function. The high check-node degree of a high-rate RS-LDPC code leads to challenges in the efficient implementation of a high-throughput decoder. To overcome this difficulty, the variable nodes have been partitioned into several groups, and each group is processed sequentially in order to shorten the critical-path delay and hence increase the maximum operating frequency. In addition, shuffled message-passing decoding is adopted, since fewer iterations can be used to achieve the desired bit-error-rate performance. In order to demonstrate the usefulness of the proposed flexible-permutator-based architecture, one single-mode rate-0.84 decoder and two multimode decoders whose code rates range between 0.79 and 0.93 have been implemented. These decoders can achieve multigigabit-per-second throughput. Using the proposed architecture to support lower rate RS-LDPC codes, e.g., rate-0.568 code, is also investigated.

vehicular technology conference | 2007

A Fast-Convergence Decoding Method and Memory-Efficient VLSI Decoder Architecture for Irregular LDPC Codes in the IEEE 802.16e Standards

Yeong-Luh Ueng; Chung-Chao Cheng

In this paper, we propose a modified iterative decoding algorithm to decode a special class of quasi-cyclic low- density parity-check (QC-LDPC) codes such as QC-LDPC codes used in the IEEE 802.16e standards. The proposed decoding is implemented by serially decoding block codes with identical parity-check matrix H1 derived from the parity-check matrix H of the QC-LDPC codes. The dimensions of H1 are much smaller than those of H. Extrinsic values can be passed among these block codes since the code bits of these block codes are overlapped. Hence, the proposed decoding can reduce the number of iterations required by up to forty percent without error performance loss as compared to the conventional message- passing decoding algorithm. A partially-parallel very large-scale integration (VLSI) architecture is proposed to implement such a decoding algorithm. The proposed VLSI decoder can fully take advantage of the proposed decoding to increase its throughput. In addition, the proposed decoder only needs to store check-to- variable messages and hence is memory efficient.

IEEE Transactions on Signal Processing | 2013

A High-Throughput Trellis-Based Layered Decoding Architecture for Non-Binary LDPC Codes Using Max-Log-QSPA

Yeong-Luh Ueng; Hsueh-Chih Chou; Chung-Jay Yang

This paper presents a high-throughput decoder architecture for non-binary low-density parity-check (LDPC) codes, where the

IEEE Transactions on Communications | 2013

Two Informed Dynamic Scheduling Strategies for Iterative LDPC Decoders

Huang-Chang Lee; Yeong-Luh Ueng; Shan-Ming Yeh; Wen-Yen Weng

IEEE Journal on Selected Areas in Communications | 2009

Turbo coded multiple-antenna systems for near-capacity performance

Yeong-Luh Ueng; Chia-Jung Yeh; Mao-Chao Lin; Chung-Li Wang

-ary sum-product algorithm (QSPA) in the log domain is considered. We reformulate the check-node processing such that an efficient trellis-based implementation can be used, where forward and backward recursions are involved. In order to increase the decoding throughput, bidirectional forward-backward recursion is used. In addition, layered decoding is adopted to reduce the number of iterations based on a given performance. Finally, a message compression technique is used to reduce the storage requirements and hence the area. Using a 90-nm CMOS process, a 32-ary (837,726) LDPC decoder was implemented to demonstrate the proposed techniques and architecture. This decoder can achieve a throughput of 233.53 Mb/s at a clock frequency of 250 MHz based on the post-layout results. Compared to the decoders presented in previous literature, the proposed decoder can achieve the highest throughput based on a similar/better error-rate performance.

IEEE Transactions on Circuits and Systems | 2014

A Fully Parallel LDPC Decoder Architecture Using Probabilistic Min-Sum Algorithm for High-Throughput Applications

Chung-Chao Cheng; Jeng-Da Yang; Huang-Chang Lee; Chia-Hsiang Yang; Yeong-Luh Ueng

When residual belief-propagation (RBP), which is a kind of informed dynamic scheduling (IDS), is applied to low-density parity-check (LDPC) codes, the convergence speed in error-rate performance can be significantly improved. However, the RBP decoders presented in previous literature suffer from poor convergence error-rate performance due to the two phenomena explored in this paper. The first is the greedy-group phenomenon, which results in a small part of the decoding graph occupying most of the decoding resources. By limiting the number of updates for each edge message in the decoding graph, the proposed Quota-based RBP (Q-RBP) schedule can reduce the probability of greedy groups forming. The other phenomenon is the silent-variable-nodes issue, which is a condition where some variable nodes have no chance of contributing their intrinsic messages to the decoding process. As a result, we propose the Silent-Variable-Node-Free RBP (SVNF-RBP) schedule, which can force all variable nodes to contribute their intrinsic messages to the decoding process equally. Both the Q-RBP and the SVNF-RBP provide appealing convergence speed and convergence error-rate performance compared to previous IDS decoders for both dedicated and punctured LDPC codes.

IEEE Transactions on Communications | 2014

LDPC Decoding Scheduling for Faster Convergence and Lower Error Floor

Huang-Chang Lee; Yeong-Luh Ueng

For a turbo coded BLAST (Bell LAbs Space-Time architecture) system with Nt transmit antennas and Nr receive antennas, there is a significant gap between its detection threshold and the capacity in case Nt > Nr. In this paper, we show that by introducing a convolutional interleaver with block delay between the BLAST mapper and the turbo encoder, the threshold can be improved. Near-capacity thresholds can be achieved for some cases. To take advantage of the low detector complexity in Alamouti STBC (space-time block code), we also investigate a STBC system, which is the concatenation of the Alamouti STBC with a turbo trellis coded modulation. By using a proper labelling and adding a convolutional interleaver with block delay to such a STBC system, we achieve both lower error floors and lower thresholds.

Explore More