Chun-Fu Liao
National Tsing Hua University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Chun-Fu Liao.
IEEE Transactions on Very Large Scale Integration Systems | 2014
Chun-Fu Liao; Jhong-Yu Wang; Yuan-Hao Huang
This paper presents the VLSI implementation of a lattice-reduction-aided (LRA) detection system. The proposed system includes a QR decomposition, lattice reduction (LR) processor, and sorting-reduced (SR) K-best detector for 8 × 8 multiple-input multiple-output (MIMO) systems. The bit error rate of the proposed MIMO detection system only incurs approximately 3 dB of implementation loss compared with optimal maximum likelihood detection with 64-quadratic-amplitude modulation. The proposed processor can also support different throughput requirements by adjusting the stage number of LR. The SR K-best detector can achieve 3.1 Gb/s throughput with 0.24-ns latency. The throughput of the system reaches 585 Mb/s if one channel preprocessing can support 72 symbol detections. The corresponding energy per bit is 63 pJ/bit, which is the smallest value achieved to date. This paper presents the first VLSI implementation of a complete LRA K-best detector with an 8 × 8 dimension.
IEEE Transactions on Circuits and Systems Ii-express Briefs | 2014
Chun-Fu Liao; Fang-Chun Lan; Jin-Wei Jhang; Yuan-Hao Huang
This brief presents a lattice reduction (LR) aided precoding processor for 64-QAM 4 × 4 multiple-input-multiple-output systems. The proposed processor is based on a modified Lenstra-Lenstra-Lovász LR algorithm and the Tomlinson-Harashima precoding (THP) algorithm. This study develops a configurable architecture for high-throughput THP processing or high-performance LR-THP processing. The proposed processor can also change the stage number of the LR algorithm to achieve a tradeoff between performance and throughput. This study designs and implements the precoding processor by using TSMC 90-nm 1P9M CMOS technology. The chip measurement results presented in this study show that the proposed processor achieves 576 Mbit/s in the THP mode or 10-3 bit error rate in the LR-THP mode with 64-QAM modulation at 28.3 dB. The chip occupies a 0.5- mm2 area and consumes 15.4 mW of power at its maximum clock speed of 120 MHz.
international conference on acoustics, speech, and signal processing | 2011
Chun-Fu Liao; Fang-Chun Lan; Yuan-Hao Huang; Po-Lin Chiu
Recent studies have investigated lattice-reduction (LR) preprocessing technique formultiple-input multiple-output (MIMO) detection. However, if LR is applied to the orthogonal-frequency-division-multiplexing (OFDM) system, its complexity and latency increase greatly because of the large number of sub-carriers. This paper proposes a new processing architecture for LR-aided MIMO-OFDM system. This LR processing architecture reduces the number of iteration loops by using preprocessing matrix of adjacent sub-carrier. Beside, the grouping of sub-carriers can break the long critical computational path so as to comprise the computational complexity and latency. We simulate the proposed LR-aided MIMO-OFDM processing in the 3GPP-LTE system. The proposed method not only reduces the computational complexity but also shortens the latency for the lattice reduction.
asian solid state circuits conference | 2011
Po-Lin Chiu; Lin-Zheng Huang; Li-Wei Chai; Chun-Fu Liao; Yuan-Hao Huang
This paper presents a configurable MIMO detector with QR decomposition and channel interpolation for 4×4 MIMO-OFDM systems. QR decomposition (QRD) processor and MIMO detector are usually investigated independently in the literature; however, this methodology usually limits operating condition to only slow fading channel. Therefore, this study jointly designs a hardware architecture for QR decomposition and MIMO detection without interface memory buffer, and the channel interpolation is also performed at the same time. This QR-based MIMO detector is implemented in a 90nm CMOS process with 2.02mm2 core area. The chip consumes 56.8mW at 114MHz operating frequency with 1.0V supply and achieves 684Mbps throughput. The proposed chip, which performs one QRD and one MIMO detection for one OFDM symbol, is the first MIMO detector in the literature that supports real-time QR decomposition and MIMO detection for fast fading MIMO-OFDM systems.
asilomar conference on signals, systems and computers | 2009
Chun-Fu Liao; Yuan-Hao Huang
In this paper, we propose a low-complexity constant-throughput LLL algorithm for lattice-reduction-aided (LRA) multi-input multi-output (MIMO) detection. The traditional LLL algorithm for the lattice reduction has a drawback of varying throughput due to the variable iteration loops for the size-reduction and LLL-reduction checks. To address this problem, we propose a constant-throughput LLL (CT-LLL) algorithm that is well suited for real-time implementation. We further propose some techniques to reduce the redundant operations in the CT-LLL algorithm so that the computational complexity can be reduced. Simulation and analysis results show that the proposed low-complexity CT-LLL algorithm reduces the complexity of the CT-LLL algorithm for 4×4 and 8×8 MIMO systems to 80% and 72.94%, respectively, with negligible performance degradation.
asian solid state circuits conference | 2013
Chun-Fu Liao; Jhong-Yu Wang; Yuan-Hao Huang
This study presents a joint QR decomposition and lattice reduction processor for 8×8 multiple-input multiple-output (MIMO) systems. The proposed algorithm enhances the BER performance by lattice reduction and reduces the hardware cost by sharing computation units and removing redundant operations. This processor can be reconfigured as three different modes, including joint QR decomposition and lattice reduction, lattice reduction, and QR decomposition. The proposed processor was implemented in TSMC 90nm 1P9M CMOS technology. The maximum throughput is 1.1 M matrix/s for QR decomposition, and 0.5 M matrix/s for the lattice reduction, and 0.33 M matrix/s for the joint QR decomposition and lattice reduction at a power consumption of 31.2 mW. The energy efficiency achieves 0.18nJ/matrix for the 8×8 MIMO preprocessing including both QR decomposition and lattice reduction.
Journal of Electrical and Computer Engineering | 2012
Chun-Fu Liao; Li-Wei Chai; Yuan-Hao Huang
We propose a loop-reduction LLL (LR-LLL) algorithm for lattice-reduction-aided (LRA) multi-input multioutput (MIMO) detection. The LLL algorithm is an iterative algorithm that contains many check and process operations; however, the traditional LLL algorithm itself possesses a lot of redundant check operations. To solve this problem, we propose a look-ahead check technique that not only reduces the complexity of the LLL algorithm but also produces the lattice-reduced matrix which obeys the original LLL criterion. Simulation results show that the proposed LR-LLL algorithm reduces the average number of loops or computation complexity. Besides, it also shortens the latency of clock cycles about 19.4%, 29.1%, and 46.1% for 4 × 4, 8 × 8, and 12 × 12 MIMO systems, respectively.
asia pacific conference on circuits and systems | 2010
Chun-Fu Liao; Li-Wei Chai; Po-Lin Chiu; Yuan-Hao Huang
In recent years, lattice-reduction-aided (LRA) MIMO detection attracts much research attention because it can effectively improve MIMO detection performance. However, it causes performance degradation in the low- and mid-SNR channel especially in the high-dimensional MIMO system. Therefore, we propose a multi-stage lattice-reduction-aided (MS-LRA) MIMO detection scheme in this letter. The proposed MS-LRA MIMO detector employs a reverse-order LLL algorithm and obtains detection diversity from using temporarily lattice-reduced QR matrices and multiple low-complexity MIMO detectors. And two stage is chosen for complexity and performance trade-off. Thereby, it achieves nearly-optimal ML performance in the low-and mid-SNR region with lower computational complexity and lower latency than those of the traditional LRA MIMO detector.
signal processing systems | 2017
Wen-Yu Chen; Chun-Fu Liao; Yuan-Hao Huang
This paper presents the design and implementation of a joint interpolation-based QR decomposition and lattice reduction processor for the MIMO detection in 4 × 4 multiple-input multiple-output (MIMO) orthogonal frequency division multiplexing (OFDM) systems. The proposed algorithm considers the coherence bandwidth in the OFDM spectrum to reduce the computational complexity of the QR decomposition and lattice reduction. This study also proposes a MIMO preprocessing architecture and a time scheduling algorithm for allocating the tasks of the processing elements. The hardware analysis results show that the proposed design method yields the smallest area and processing time (AT) product compared to the baseline architectures under most channel environments. The proposed processor was designed and implemented in TSMC 90nm 1P9M CMOS technology. The proposed processor achieves at most 6.592 M matrix/s with 135.14 MHz clock speed and 220.68 K gates.
international symposium on vlsi design, automation and test | 2017
Jing-You Lin; Jung-Chun Chi; Chun-Fu Liao; Yuan-Hao Huang
Multiple-input multiple-output (MIMO) communication is an important technique to increase the transmission capacity, but the increased antenna number does not necessarily increase the throughput because of more antenna interference and decoding complexity. An iterative detector-and-decoder (IDD) can effectively improve transmission performance by exchanging reliability information such as log likelihood ratio between a MIMO detector and a error-correction-code decoder, but the IDD reduces the throughput because of the iteration loop. Therefore, this paper proposes a lattice-reduction-aided (LRA) soft-output K-best detector to eliminate the iteration loop and devises an effective method to calculate the reliable information. The proposed 8 × 8 LRA soft-output K-best detector achieves a better performance than the 8 × 8 MIMO detectors in the literature. The proposed detector was designed and implemented using TSMC 90nm 1P9M CMOS process. The post-layout results show that the detector chip achieves a throughput of 6.4G LLRs/sec at its maximum frequency of 133.3 MHz with 64-QAM modulation.