Bei Yin | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Bei Yin is active.

Explore More

Publication

Featured researches published by Bei Yin.

IEEE Journal of Selected Topics in Signal Processing | 2014

Large-Scale MIMO Detection for 3GPP LTE: Algorithms and FPGA Implementations

Michael Wu; Bei Yin; Guohui Wang; Chris Dick; Joseph R. Cavallaro; Christoph Studer

Large-scale (or massive) multiple-input multiple-out put (MIMO) is expected to be one of the key technologies in next-generation multi-user cellular systems based on the upcoming 3GPP LTE Release 12 standard, for example. In this work, we propose-to the best of our knowledge-the first VLSI design enabling high-throughput data detection in single-carrier frequency-division multiple access (SC-FDMA)-based large-scale MIMO systems. We propose a new approximate matrix inversion algorithm relying on a Neumann series expansion, which substantially reduces the complexity of linear data detection. We analyze the associated error, and we compare its performance and complexity to those of an exact linear detector. We present corresponding VLSI architectures, which perform exact and approximate soft-output detection for large-scale MIMO systems with various antenna/user configurations. Reference implementation results for a Xilinx Virtex-7 XC7VX980T FPGA show that our designs are able to achieve more than 600 Mb/s for a 128 antenna, 8 user 3GPP LTE-based large-scale MIMO system. We finally provide a performance/complexity trade-off comparison using the presented FPGA designs, which reveals that the detector circuit of choice is determined by the ratio between BS antennas and users, as well as the desired error-rate performance.

international symposium on circuits and systems | 2013

Approximate matrix inversion for high-throughput data detection in the large-scale MIMO uplink

Michael Wu; Bei Yin; Aida Vosoughi; Christoph Studer; Joseph R. Cavallaro; Chris Dick

The high processing complexity of data detection in the large-scale multiple-input multiple-output (MIMO) uplink necessitates high-throughput VLSI implementations. In this paper, we propose - to the best of our knowledge - first matrix inversion implementation suitable for data detection in systems having hundreds of antennas at the base station (BS). The underlying idea is to carry out an approximate matrix inversion using a small number of Neumann-series terms, which allows one to achieve near-optimal performance at low complexity. We propose a novel VLSI architecture to efficiently compute the approximate inverse using a systolic array and show reference FPGA implementation results for various system configurations. For a system where 128 BS antennas receive data from 8 single-antenna users, a single instance of our design processes 1.9M matrices/s on a Xilinx Virtex-7 FPGA, while using only 3.9% of the available slices and 3.6% of the available DSP48 units.

ieee computer society annual symposium on vlsi | 2006

Connection-oriented multicasting in wormhole-switched networks on chip

Zhonghai Lu; Bei Yin; Axel Jantsch

Network-on-chip (NoC) proposes networks to replace buses as a scalable global communication interconnect for future SoC designs. However, a bus is very efficient in broadcasting. As the system size scales up to explore the chip capacity, broadcasting in NoCs must be efficiently supported. This paper presents a novel multicast scheme in wormhole-switched NoCs. By this scheme, a multicast procedure consists of establishment, communication and release phase. A multicast group can request to reserve virtual channels during establishment and has priority on arbitration of link bandwidth. This multicasting method has been effectively implemented in a mesh network with deadlock freedom. Our experiments show that the multicast technique improves throughput, and does not exhibit significant impact on unicast performance in a network with mixed unicast and multicast traffic if the network is not saturated

global communications conference | 2014

Conjugate gradient-based soft-output detection and precoding in massive MIMO systems

Bei Yin; Michael Wu; Joseph R. Cavallaro; Christoph Studer

Massive multiple-input multiple-output (MIMO) promises improved spectral efficiency, coverage, and range, compared to conventional (small-scale) MIMO wireless systems. Unfortunately, these benefits come at the cost of significantly increased computational complexity, especially for systems with realistic antenna configurations. To reduce the complexity of data detection (in the uplink) and precoding (in the downlink) in massive MIMO systems, we propose to use conjugate gradient (CG) methods. While precoding using CG is rather straightforward, soft-output minimum mean-square error (MMSE) detection requires the computation of the post-equalization signal-to-interference-and-noise-ratio (SINR). To enable CG for soft-output detection, we propose a novel way of computing the SINR directly within the CG algorithm at low complexity. We investigate the performance/complexity trade-offs associated with CG-based soft-output detection and precoding, and we compare it to existing exact and approximate methods. Our results reveal that the proposed algorithm is able to outperform existing methods for massive MIMO systems with realistic antenna configurations.

ieee global conference on signal and information processing | 2013

High throughput low latency LDPC decoding on GPU for SDR systems

Guohui Wang; Michael Wu; Bei Yin; Joseph R. Cavallaro

In this paper, we present a high throughput and low latency LDPC (low-density parity-check) decoder implementation on GPUs (graphics processing units). The existing GPU-based LDPC decoder implementations suffer from low throughput and long latency, which prevent them from being used in practical SDR (software-defined radio) systems. To overcome this problem, we present optimization techniques for a parallel LDPC decoder including algorithm optimization, fully coalesced memory access, asynchronous data transfer and multi-stream concurrent kernel execution for modern GPU architectures. Experimental results demonstrate that the proposed LDPC decoder achieves 316 Mbps (at 10 iterations) peak throughput on a single GPU. The decoding latency, which is much lower than that of the state of the art, varies from 0.207 ms to 1.266 ms for different throughput requirements from 62.5 Mbps to 304.16 Mbps. When using four GPUs concurrently, we achieve an aggregate peak throughput of 1.25 Gbps (at 10 iterations).

asilomar conference on signals, systems and computers | 2013

Full-duplex in large-scale wireless systems

Bei Yin; Michael Wu; Christoph Studer; Joseph R. Cavallaro; Jorma Lilleberg

In this paper, we investigate the combination of full-duplex wireless communication with large-scale multiple-input multiple-output (MIMO) technology, which has the potential for bidirectional wireless communication at high spectral efficiency and low power consumption. In addition, we study its application to cellular (multi-user) systems that could be extended with large antenna arrays, such as 3GPP LTE. In order to solve the fundamental issue of self-interference cancellation in full-duplex cellular communication systems, we propose two schemes that exploit the excess of antennas present at the base-station (BS) of large-scale MIMO systems. We investigate the associated sum-rate and show that by carefully selecting the ratio between number of transmit and receive antennas at the BS, one is able to maximize the system capacity. We furthermore investigate the inter-user interference issue that occurs in multi-user scenarios, as well as the impact of residual transmit-side (TX) radio-frequency (RF) impairments. Our preliminary results show that large-scale MIMO is able to render full-duplex communication more resilient against inter-user interference and helps to mitigate the effects of residual TX-RF impairments.

international conference on acoustics, speech, and signal processing | 2013

Implementation trade-offs for linear detection in large-scale MIMO systems

Bei Yin; Michael Wu; Christoph Studer; Joseph R. Cavallaro; Chris Dick

In this paper, we analyze the VLSI implementation tradeoffs for linear data detection in the uplink of large-scale multiple-input multiple-output (MIMO) wireless systems. Specifically, we analyze the error incurred by using the sub-optimal, low-complexity matrix inverse proposed in Wu et al., 2013, ISCAS, and compare its performance and complexity to an exact matrix inversion algorithm. We propose a Cholesky-based reference architecture for exact matrix inversion and show corresponding implementation results on an Virtex-7 FPGA. Using this reference design, we perform a performance/complexity trade-off comparison with an FPGA implementation for the proposed approximate matrix inversion, which reveals that the inversion circuit of choice is determined by the antenna configuration (base-station antennas vs. number of users) of large-scale MIMO systems.

international conference on acoustics, speech, and signal processing | 2014

A 3.8Gb/s large-scale MIMO detector for 3GPP LTE-Advanced

Bei Yin; Michael Wu; Guohui Wang; Chris Dick; Joseph R. Cavallaro; Christoph Studer

This paper proposes - to the best of our knowledge - the first ASIC design for high-throughput data detection in single carrier frequency division multiple access (SC-FDMA)-based large-scale MIMO systems, such as systems building on future 3GPP LTE-Advanced standards. In order to substantially reduce the complexity of linear soft-output data detection in systems having hundreds of antennas at the base station (BS), the proposed detector builds upon a truncated Neumann series expansion to compute the necessary matrix inverse at low complexity. To achieve high throughput in the 3GPP LTE-A uplink, we develop a systolic VLSI architecture including all necessary processing blocks. We present a corresponding ASIC design that achieves 3.8 Gb/s for a 128 antenna, 8 user 3GPP LTE-A based large-scale MIMO system, while occupying 11.1 mm2 in a TSMC 45nm CMOS technology.

international symposium on circuits and systems | 2015

VLSI design of large-scale soft-output MIMO detection using conjugate gradients

Bei Yin; Michael Wu; Joseph R. Cavallaro; Christoph Studer

We propose an FPGA design for soft-output data detection in orthogonal frequency-division multiplexing (OFDM)-based large-scale (multi-user) MIMO systems. To reduce the high computational complexity of data detection, our design uses a modified version of the conjugate gradient least square (CGLS) algorithm. In contrast to existing linear detection algorithms for massive MIMO systems, our method avoids two of the most complex tasks, namely Gram-matrix computation and matrix inversion, while still being able to compute soft-outputs. Our architecture uses an array of reconfigurable processing elements to compute the CGLS algorithm in a hardware-efficient manner. Implementation results on Xilinx Virtex-7 FPGA for a 128 antenna, 8 user large-scale MIMO system show that our design only uses 70% of the area-delay product of the competitive method, while exhibiting superior error-rate performance.

global communications conference | 2013

High-throughput beamforming receiver for millimeter wave mobile communication

Bei Yin; Shadi Abu-Surra; Gary Xu; Thomas Henige; Eran Pisek; Zhouyue Pi; Joseph R. Cavallaro

In this paper, we present a novel FPGA-based high-throughput beamforming MIMO receiver for millimeter wave mobile communication. With vast spectrum and small antenna element size, millimeter wave communication becomes very attractive and promising to support next generation mobile communication (5G). However, the high data rate requirement challenges both algorithm and architecture. In order to support the high data rate and to reduce the overhead of selecting the best beam pair, we propose a novel beamforming synchronization scheme more suitable for mobile communication. By further optimizing the algorithm and the architecture, we present a complete mobile receiver based on FPGA, which includes RF frontend, ADC, beamforming control, synchronization, channel estimator, soft MAP detector, and channel decoder. The design operates at 28 GHz carrier frequency with 500 MHz bandwidth. The throughput can reach 1.52 Gbps. We also performed the indoor and outdoor over-the-air transmission field tests. This work provides a platform for future millimeter wave mobile communication research.

Explore More