Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Luechao Yuan is active.

Publication


Featured researches published by Luechao Yuan.


IEEE Communications Letters | 2016

High Precision Low Complexity Matrix Inversion Based on Newton Iteration for Data Detection in the Massive MIMO

Chuan Tang; Cang Liu; Luechao Yuan; Zuocheng Xing

Currently, massive multiple-input multiple-output (MIMO) is one of the most promising wireless transmission technologies for 5G. Massive MIMO requires handling with large-scale matrix computation, especially for matrix inversion. In this letter, we find that matrix inversion based on Newton iteration (NI) is suitable for data detection in massive MIMO system. In contrast with recently proposed polynomial expansion (PE) method for matrix inversion, we analyze both the algorithm complexity and precision in detail, and propose a diagonal band Newton iteration (DBNI) method, which is an approximate method for NI. Compared with PE method, DBNI can obtain higher precision and approximately equal complexity, and we give an explanation of how to select the bandwidth of DBNI.


IEEE Transactions on Circuits and Systems Ii-express Briefs | 2017

A Novel Architecture to Eliminate Bottlenecks in a Parallel Tiled QRD Algorithm for Future MIMO Systems

Cang Liu; Zuocheng Xing; Luechao Yuan; Chuan Tang; Yang Zhang

QR decomposition (QRD) is one of the performance bottlenecks in a lot of high performance wireless communication algorithms and should have the flexibility property for future multiple-input multiple-output systems. However, the existing QRD architectures only focus on several fixed dimension matrices. The parallel tiled QRD algorithm is a perfect choice to implement QRD for its flexibility and modularity property. The size of the tile is set to 2 × 2 instead of the traditional 200 × 200 or more to support flexible antenna configurations in this brief. Using a look-ahead technique and the property of unitary matrix, a novel algorithm based on a modified Gram-Schmidt (MGS) algorithm is proposed for the bottleneck operations (GEQRT and TTQRT) of the parallel tiled QRD algorithm. A corresponding hardware architecture is also designed with the proposed algorithm. The implementation results show that the hardware architecture based on the proposed algorithm achieves a 2.7× reduction in normalized processing latency, compared with the one based on the traditional MGS algorithm.


Iet Circuits Devices & Systems | 2016

QR decomposition architecture using the iteration look-ahead modified Gram–Schmidt algorithm

Cang Liu; Chuan Tang; Luechao Yuan; Zuocheng Xing; Yang Zhang

QR decomposition is extensively adopted in multiple-input–multiple-output orthogonal frequency-division multiplexing wireless communication systems, and is one of the performance bottlenecks in lots of high-performance wireless communication algorithms. To implement low processing latency QR decomposition with hardware, the authors propose a novel iterative look-ahead modified Gram–Schmidt (ILMGS) algorithm based on the traditional modified Gram–Schmidt (MGS) algorithm. They also design the corresponding triangular systolic array (TSA) architecture with the proposed ILMGS algorithm, which only needs n time slots for a n × n real matrix. For reducing the hardware overhead, they modify the TSA architecture into an iterative architecture. They also design a modified iterative architecture to further reduce the hardware overhead. The implementation results show that the normalised processing latency of the modified iterative architecture based on the proposed ILMGS algorithm is 1.36 times lower than the one based on the MGS algorithm. To the best of the authors’ knowledge, the designed architecture achieves the superior latency performance than the existing works.


international symposium on communications and information technologies | 2014

The acceleration of turbo decoder on the newest GPGPU of Kepler architecture

Yang Zhang; Zuocheng Xing; Luechao Yuan; Cang Liu; Qinglin Wang

In the paper, a new implementation of a 3GPP LTE standards compliant turbo decoder based on GPGPU is proposed. It uses the newest GPU-Tesla K20c, which is based on the Kepler GK110 architecture. The new architecture has more powerful parallel computing capability and we use it to fully exploit the parallelism in the turbo decoding algorithm in novel ways. Meanwhile, we use various memory hierarchies to meet various kinds of data demands on speed and capacity. Simulation shows that our implementation is practical and it gets 76% improvement on throughput over the latest GPU implementation. The result demonstrates that the newest Kepler architecture is suitable for turbo decoding and it can be a promising reconfigurable platform for the communication system.


IEEE Transactions on Very Large Scale Integration Systems | 2017

Hardware Architecture Based on Parallel Tiled QRD Algorithm for Future MIMO Systems

Cang Liu; Chuan Tang; Zuocheng Xing; Luechao Yuan; Yang Zhang

QR decomposition (QRD) has been a vital component in the transceiver processor of future multiple-input multiple-output (MIMO) systems, in which antenna configuration will be more and more flexible. Therefore, the QRD hardware architecture in the future MIMO systems should be more flexible to meet various antenna configurations. Unfortunately, the existing QRD hardware architectures mainly focus on the matrix of one or several fixed sizes. This paper presents a new triangular systolic array QRD hardware architecture based on parallel tiled QRD algorithm to decompose an


CCF National Conference on Compujter Engineering and Technology | 2015

Channel Estimation in Massive MIMO: Algorithm and Hardware

Chuan Tang; Cang Liu; Luechao Yuan; Zuocheng Xing

{8}\times {8}


trust security and privacy in computing and communications | 2013

An Optimizing Strategy Research of LDPC Decoding Based on GPGPU

Luechao Yuan; Zuocheng Xing; Yang Zhang; Xiaobao Chen

real matrix. The designed hardware architecture is flexible and can be used in various MIMO systems, in which the number of antennas is smaller than 4. This paper also proposes a modified algorithm for the bottleneck operations of parallel tiled QRD algorithm to reduce the hardware overhead. To further reduce the hardware overhead, the Newton–Raphson algorithm is adopted in the proposed algorithm. The implementation results show that the normalized processing latency performance and the normalized processing efficiency performance of the designed QRD hardware architecture both are better than most of the existing QRD hardware architectures. To the best of our knowledge, the hardware architecture presented in this paper achieves the superior normalized QRD rate performance to the existing QRD hardware architectures.


Iet Communications | 2017

Approximate iteration detection with iterative refinement in massive MIMO systems

Chuan Tang; Cang Liu; Luechao Yuan; Zuocheng Xing

Currently 5G is research hotspot in communication field, and one of the most promising wireless transmission technologies for 5G is massive multiple input multiple output (MIMO) which provides high data rate and energy efficiency. The main challenge of massive MIMO is the channel estimation due to the complexity and pilot contamination. Some improvement of traditional channel estimation methods to solve the problem in massive MIMO have been introduced in this paper. Besides, the hardware acceleration is useful for massive MIMO channel estimation algorithm. We discuss the relate work about hardware accelerator of matrix inversion and singular value decomposition which are the main complex operations of channel estimation. We find that the memory system, network of processing elements and the precision will be the main research directions for the hardware design of large-scale data size.


IEEE Transactions on Very Large Scale Integration Systems | 2017

A Flexible Divide-and-Conquer MPSoC Architecture for MIMO Interference Cancellation

Luechao Yuan; Cang Liu; Chuan Tang; Shan Huang; Anupam Chattopadhyay; Gerd Ascheid; Zuocheng Xing

As powerful error correcting codes, Low-Density Parity-Check (LDPC) codes have been adopted as a fundamental building block by dirty paper coding (DPC), which indicates that lossless precoding is theoretically possible at any signal-to-noise ratio (SNR), and is a promising strategy in future communication systems. However, to achieve this performance gain demands huge computation complexity. For its lower cost and better flexibility, the GPU-based LDPC decoder is an emerging research subject. Based on the perspective of GPU hardware architecture, a multi-stage optimizing mapping strategy (MSOMS) is proposed and implemented to accelerate LDPC decoding. The performance is boosted significantly by balancing the memory access and computation load, optimizing execution configuration and the memory access pattern, and fully utilizing the on-chip high speed resources. Proposed decoders can achieve 383-and 442-speedup compared to CPU-based decoder for LDPC and RA code (another ensemble of LDCP code), and the achieved throughput is comparable to existed GPU-based decoders, which confirm the efficiency of the MSOMS strategy.


international wireless internet conference | 2016

QRD Architecture Using the Modified ILMGS Algorithm for MIMO Systems

Cang Liu; Chuan Tang; Zuocheng Xing; Luechao Yuan; Yu Wang; Lirui Chen; Yang Zhang; Suncheng Xiang; Wangfeng Zhao; Xing Hu; Jinsong Xu

To improve energy efficiency and spectral efficiency, massive multiple-input–multiple-output (MIMO) is proposed and becomes a promising technology in the next generation mobile communication. However, massive MIMO systems equip with scores of or hundreds of antennas which induce large-scale matrix computations with tremendous complexity, especially for matrix inversion in data detection. Thus, many detection methods have been proposed using approximate matrix inversion algorithms, which satisfy the demand of precision with low complexity. In this study, the authors focus on the approximate detection method based on Newton iteration (NI), and propose upgraded methods named NI method with iterative refinement (NIIR) and diagonal band NIIR (DBNIIR) which combine NI method and DBNI method with iterative refinement (IR). The results show that their proposals provide about 2 dB improvement on bit error rate (BER) for 16-quadrature amplitude modulation (QAM), and could even break the error floor existing in NI and DBNI methods for 64-QAM modulation. Furthermore, the BER of their proposals could provide almost the same performance as the exact method. Moreover, in contrast with NI and DBNI methods, NIIR and DBNIIR methods require quite few extra complexity cost and no extra hardware resource which is quite suitable for data detection in massive MIMO.

Collaboration


Dive into the Luechao Yuan's collaboration.

Top Co-Authors

Avatar

Cang Liu

National University of Defense Technology

View shared research outputs
Top Co-Authors

Avatar

Zuocheng Xing

National University of Defense Technology

View shared research outputs
Top Co-Authors

Avatar

Chuan Tang

National University of Defense Technology

View shared research outputs
Top Co-Authors

Avatar

Yang Zhang

National University of Defense Technology

View shared research outputs
Top Co-Authors

Avatar

Lirui Chen

National University of Defense Technology

View shared research outputs
Top Co-Authors

Avatar

Qinglin Wang

National University of Defense Technology

View shared research outputs
Top Co-Authors

Avatar

Shan Huang

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Suncheng Xiang

National University of Defense Technology

View shared research outputs
Top Co-Authors

Avatar

Wangfeng Zhao

National University of Defense Technology

View shared research outputs
Top Co-Authors

Avatar

Xiantuo Tang

National University of Defense Technology

View shared research outputs
Researchain Logo
Decentralizing Knowledge