Kaipeng Li | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Kaipeng Li is active.

Explore More

Publication

Featured researches published by Kaipeng Li.

2015 IEEE Dallas Circuits and Systems Conference (DCAS) | 2015

Accelerating massive MIMO uplink detection on GPU for SDR systems

Kaipeng Li; Bei Yin; Michael Wu; Joseph R. Cavallaro; Christoph Studer

We present a reconfigurable GPU-based uplink detector for massive MIMO software-defined radio (SDR) systems. To enable high throughput, we implement a configurable linear minimum mean square error (MMSE) soft-output detector and reduce the complexity without sacrificing its error-rate performance. To take full advantage of the GPU computing resources, we exploit the algorithms inherent parallelism and make use of efficient CUDA libraries and the GPUs hierarchical memory resources. We furthermore use multi-stream scheduling and multi-GPU workload deployment strategies to pipeline streaming-detection tasks with little host-device memory copy overhead. Our flexible design is able to switch between a high accuracy Cholesky-based detection mode and a high throughput conjugate gradient (CG)-based detection mode, and supports various antenna configurations. Our GPU implementation exceeds 250 Mb/s detection throughput for a 128×16 antenna system.

asilomar conference on signals, systems and computers | 2015

Sub-band digital predistortion for noncontiguous transmissions: Algorithm development and real-time prototype implementation

Mahmoud Abdelaziz; Chance Tarver; Kaipeng Li; Lauri Anttila; Raul Martinez; Mikko Valkama; Joseph R. Cavallaro

This article proposes a novel, reduced complexity, block-adaptive digital predistortion (DPD) technique for mitigating the spurious emissions that occur when amplifying spectrally noncontiguous signals with a nonlinear power amplifier (PA). The introduced DPD solution is designed for real-time scenarios where a loop delay exists in the DPD system. By a proper choice of the DPD parameters, the technique is shown to be robust against arbitrarily long loop delays while not sacrificing DPD linearization performance and convergence speed. Moreover, the proposed DPD solution has lower complexity compared to previously proposed solutions in the literature while giving excellent linearization performance in terms of mitigating the spurious emissions. Real-time implementations of the algorithm on the WARP platform are developed, including considerations for several key trade-offs in the hardware design to balance the robustness, performance and complexity. The simulations and real-time FPGA experiments evidence excellent and robust performance in real-life situations with highly nonlinear PAs and arbitrary loop delays.

asilomar conference on signals, systems and computers | 2014

A high performance GPU-based software-defined basestation

Kaipeng Li; Michael Wu; Guohui Wang; Joseph R. Cavallaro

We present a high performance GPU-based software-defined basestation. The key idea is to explore the feasibility of using GPU as a baseband processor for supporting software-defined basestation to achieve both real-time high performance and high reconfigurability, considering the numerous computing resources and flexible programming interface of GPU. Based on an existing WARPLab SDR framework, we put effort on the exploration of the data level parallelism and algorithm level parallelism of baseband kernels for GPU acceleration, as well as the task level parallelism in the system for the task pipelining of packet data transfer and data processing. As a case study, an OFDM system is implemented to better demonstrate the concept of our system architecture and optimization strategies. In this case, our GPU-based basestation can not only achieve less than 3ms latency and more than 50Mbps throughput for processing streaming frames in real-time, but also offer software-defined flexibility and scalability for supporting future wireless standards.

IEEE Journal on Emerging and Selected Topics in Circuits and Systems | 2017

Decentralized Baseband Processing for Massive MU-MIMO Systems

Kaipeng Li; Rishi Sharan; Yujun Chen; Tom Goldstein; Joseph R. Cavallaro; Christoph Studer

Achieving high spectral efficiency in realistic massive multi-user (MU) multiple-input multiple-output (MIMO) wireless systems requires computationally complex algorithms for data detection in the uplink (users transmit to base-station) and beamforming in the downlink (base-station transmits to user). Most existing algorithms are designed to be executed on centralized computing hardware at the base-station (BS), which results in prohibitive complexity for systems with hundreds or thousands of antennas and generates raw baseband data rates that exceed the limits of current interconnect technology and chip I/O interfaces. This paper proposes a novel decentralized baseband processing architecture that alleviates these bottlenecks by partitioning the BS antenna array into clusters, each associated with independent radio-frequency chains, analog and digital modulation circuitry, and computing hardware. For this architecture, we develop novel decentralized data detection and beamforming algorithms that only access local channel-state information and require low communication bandwidth among the clusters. We study the associated tradeoffs between error-rate performance, computational complexity, and interconnect bandwidth, and we demonstrate the scalability of our solutions for massive MU-MIMO systems with thousands of BS antennas using reference implementations on a graphics processing unit (GPU) cluster.

IEEE Transactions on Microwave Theory and Techniques | 2016

Low-Complexity Subband Digital Predistortion for Spurious Emission Suppression in Noncontiguous Spectrum Access

Mahmoud Abdelaziz; Lauri Anttila; Chance Tarver; Kaipeng Li; Joseph R. Cavallaro; Mikko Valkama

Noncontiguous transmission schemes combined with high power-efficiency requirements pose big challenges for radio transmitter and power amplifier (PA) design and implementation. Due to the nonlinear nature of the PA, severe unwanted emissions can occur, which can potentially interfere with neighboring channel signals or even desensitize the own receiver in frequency division duplexing transceivers. In this paper, to suppress such unwanted emissions, a low-complexity subband digital predistortion solution, specifically tailored for spectrally noncontiguous transmission schemes in low-cost devices, is proposed. The proposed technique aims at mitigating only the selected spurious intermodulation distortion components at the PA output, hence allowing for substantially reduced processing complexity compared with classical linearization solutions. Furthermore, novel decorrelation-based parameter learning solutions are also proposed and formulated, which offer reduced computing complexity in parameter estimation as well as the ability to track time-varying features adaptively. Comprehensive simulation and RF measurement results are provided, using a commercial LTE-Advanced mobile PA, to evaluate and validate the effectiveness of the proposed solution in real-world scenarios. The obtained results demonstrate that highly efficient spurious component suppression can be obtained using the proposed solutions.

ieee global conference on signal and information processing | 2016

Decentralized beamforming for massive MU-MIMO on a GPU cluster

Kaipeng Li; Riski Skaran; Yujun Chen; Joseph R. Cavallaro; Tom Goldstein; Christoph Studer

In the massive multi-user multiple-input multiple-output (MU-MIMO) downlink, traditional centralized beamforming (or precoding), such as zero-forcing (ZF), entails excessive complexity for the computing hardware, and generates raw baseband data rates that cannot be supported with current interconnect technology and chip I/O interfaces. In this paper, we present a novel decentralized beamforming approach that partitions the base-station (BS) antenna array into separate clusters, each associated with independent computing hardware. We develop a decentralized beamforming algorithm that requires only local channel state information and minimum exchange of consensus information among the clusters. We demonstrate the efficacy and scalability of decentralized ZF beamforming for systems with hundreds of BS antennas using a reference implementation on a GPU cluster.

asilomar conference on signals, systems and computers | 2016

Decentralized data detection for massive MU-MIMO on a Xeon Phi cluster

Kaipeng Li; Yujun Chen; Rishi Sharan; Tom Goldstein; Joseph R. Cavallaro; Christoph Studer

Conventional centralized data detection algorithms for massive multi-user multiple-input multiple-output (MU-MIMO) systems, such as minimum mean square error (MMSE) equalization, result in excessively high raw baseband data rates and computing complexity at the centralized processing unit. Hence, practical base-station (BS) designs for massive MU-MIMO that rely on state-of-the-art hardware processors and I/O interconnect standards must find new means to avoid these bottlenecks. In this paper, we propose a novel decentralized data detection method, which partitions the BS antenna array into separate clusters. Each cluster is associated with independent computing hardware to perform decentralized data detection, which requires only local channel state information and receive data, and a minimum amount of information exchange between clusters. To demonstrate the benefits of our approach, we map our algorithm to a Xeon Phi cluster, which shows that BS designs with hundreds or thousands of BS antennas can be supported.

international symposium on information theory | 2017

On the achievable rates of decentralized equalization in massive MU-MIMO systems

Charles Jeon; Kaipeng Li; Joseph R. Cavallaro; Christoph Studer

Massive multi-user (MU) multiple-input multiple-output (MIMO) promises significant gains in spectral efficiency compared to traditional, small-scale MIMO technology. Linear equalization algorithms, such as zero forcing (ZF) or minimum mean-square error (MMSE)-based methods, typically rely on centralized processing at the base station (BS), which results in (i) excessively high interconnect and chip input/output data rates, and (ii) high computational complexity. In this paper, we investigate the achievable rates of decentralized equalization that mitigates both of these issues. We consider two distinct BS architectures that partition the antenna array into clusters, each associated with independent radio-frequency chains and signal processing hardware, and the results of each cluster are fused in a feed forward network. For both architectures, we consider ZF, MMSE, and a novel, non-linear equalization algorithm that builds upon approximate message passing (AMP), and we theoretically analyze the achievable rates of these methods. Our results demonstrate that decentralized equalization with our AMP-based methods incurs no or only a negligible loss in terms of achievable rates compared to that of centralized solutions.

ieee global conference on signal and information processing | 2015

Mobile GPU accelerated digital predistortion on a software-defined mobile transmitter

Kaipeng Li; Amanullah Ghazi; Jani Boutellier; Mahmoud Abdelaziz; Lauri Anttila; Markku J. Juntti; Mikko Valkama; Joseph R. Cavallaro

We present the design exploration and the performance evaluation of a mobile transmitter digital predistortion (DPD) module on a mobile GPU. Digital predistortion is a widely used technique for suppressing the spurious spectrum emission caused by the imperfection of power amplifier and radio frequency (RF) circuits in a real wireless transmitter. Considering the parallel architecture, numerous computing cores and programmability of GPU, in this work, a DPD design based on augmented parallel Hammerstein structure is implemented on a mobile GPU integrated in an Nvidia Jetson TK1 mobile development board, targeting at a mobile transmitter. The algorithm level and data level parallelism are carefully explored for efficient mapping of the DPD algorithm and full utilization of the mobile GPU resources. We analyze the throughput and timing performance of our implementation and verify the functionality of DPD experimentally on a novel software-defined mobile terminal. The results show that our proposed mobile GPU driven digital predistortion design not only achieves real-time high performance, but also offers programmability and reconfigurability for design upgrading and extension.

Journal of Applied Physics | 2013

Width dependent edge distribution of graphene nanoribbons unzipped from multiwall carbon nanotubes

Z. F. Zhong; H. L. Shen; R. X. Cao; L. Sun; Kaipeng Li; X. R. Wang; Haifeng Ding

We present the width dependent study of edge distribution of graphene nanoribbons unzipped from multi-wall nanotubes. The partial unzipping of the carbon nanotubes yields a mixture of carbon nanotubes and nanoribbons. Comparing atomic resolution images of scanning tunneling microscopy with the lattice of graphene, the edge structures of nanoribbons are identified. Below 10 nm, the edges are closer to armchair type. Above 20 nm, the ribbons prefer to have edges close to zigzag type. In between, a more random distribution of the edges is found. The findings are of potential usages for the edge control in graphene nanoribbon based applications.

Explore More