Is this you? Create Your Porfile

Kai Zhang

Worcester Polytechnic Institute

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Kai Zhang is active.

Explore More

Publication

Featured researches published by Kai Zhang.

IEEE Journal on Selected Areas in Communications | 2009

High-throughput layered decoder implementation for quasi-cyclic LDPC codes

Kai Zhang; Xinming Huang; Zhongfeng Wang

This paper presents a high-throughput decoder design for the Quasi-Cyclic (QC) Low-Density Parity-Check (LDPC) codes. Two new techniques are proposed, including parallel layered decoding architecture (PLDA) and critical path splitting. PLDA enables parallel processing for all layers by establishing dedicated message passing paths among them. The decoder avoids crossbar-based large interconnect network. Critical path splitting technique is based on articulate adjustment of the starting point of each layer to maximize the time intervals between adjacent layers, such that the critical path delay can be split into pipeline stages. Furthermore, min-sum and loosely coupled algorithms are employed for area efficiency. As a case study, a rate-1/2 2304-bit irregular LDPC decoder is implemented using ASIC design in 90 nm CMOS process. The decoder can achieve the maximum decoding throughput of 2.2 Gbps at 10 iterations. The operating frequency is 950 MHz after synthesis and the chip area is 2.9 mm2.

IEEE Transactions on Very Large Scale Integration Systems | 2012

High-Speed Low-Power Viterbi Decoder Design for TCM Decoders

Jinjin He; Huaping Liu; Zhongfeng Wang; Xinming Huang; Kai Zhang

High-speed, low-power design of Viterbi decoders for trellis coded modulation (TCM) systems is presented in this paper. It is well known that the Viterbi decoder (VD) is the dominant module determining the overall power consumption of TCM decoders. We propose a pre-computation architecture incorporated with T-algorithm for VD, which can effectively reduce the power consumption without degrading the decoding speed much. A general solution to derive the optimal pre-computation steps is also given in the paper. Implementation result of a VD for a rate-3/4 convolutional code used in a TCM system shows that compared with the full trellis VD, the precomputation architecture reduces the power consumption by as much as 70% without performance loss, while the degradation in clock speed is negligible.

IEEE Transactions on Circuits and Systems | 2011

A High-Throughput LDPC Decoder Architecture With Rate Compatibility

Kai Zhang; Xinming Huang; Zhongfeng Wang

This paper presents a high-throughput decoder architecture for rate-compatible (RC) low-density parity-check (LDPC) codes which supports arbitrary code rates between the rate of mother code and 1. Puncturing techniques are applied to produce different rates for quasi-cyclic (QC) LDPC codes with dual-diagonal parity structure. Simulation results show that our selected puncturing scheme only introduces the BER performance degradation of less than 0.2 dB, compared with the dedicated codes for different rates specified in the IEEE 802.16e (WiMax) standard. Subsequently, parallel layered decoding architecture (PLDA) is employed for high-throughput decoder design. While the original PLDA is lack of rate flexibility, the problem is solved gracefully by incorporating the puncturing scheme. As a case study, an RC-LDPC decoder based on the rate-1/2 WiMax LDPC code is implemented in the CMOS 65-nm process. The clock frequency is 1.1 GHz, and the synthesis core area is 1.96 mm2. The decoder can achieve an input throughput of 1.28 Gb/s at ten iterations and supports any rate between 1/2 and 1.

IEEE Transactions on Consumer Electronics | 2010

A dual-rate LDPC decoder for china multimedia mobile broadcasting systems

Kai Zhang; Xinming Huang; Zhongfeng Wang

This paper presents an efficient VLSI architecture and implementation for LDPC decoder used in China Multimedia Mobile Broadcasting (CMMB) systems. An area-efficient layered decoding architecture based on min-sum algorithm is incorporated in the design. A novel split-memory architecture is developed to efficiently handle the weight-2 submatrices that are rarely seen in conventional LDPC decoders. In addition, the check-node processing unit is highly optimized to minimize complexity and computing latency while facilitating a reconfigurable decoding core. The proposed design is implemented using 90 nm CMOS technology with the core area of approximately 4.4 mm2 and the standard supply voltage 1.0 V. The decoder can achieve the maximum throughput of 228 Mb/s for rate 1/2 and 342 Mb/s for rate 3/4 at 15 iterations of layered decoding. Therefore, it can be deployed on the CMMB mobile platform.

application specific systems architectures and processors | 2009

An Area-Efficient LDPC Decoder Architecture and Implementation for CMMB Systems

Kai Zhang; Xinming Huang; Zhongfeng Wang

This paper presents an area-efficient LDPC decoder architecture for the China Multimedia Mobile Broadcasting (CMMB) standard. Several techniques are adopted to reduce memory size, including the min-sum algorithm (MSA), optimal bit-width quantization of the iterative messages and reduced complexity for the interconnect network. The decoder for the rate-1/2 9216-bit code is implemented using the 90nm 1.0V CMOS technology. It achieves the decoding throughput of 48Mbps at 5 iterations when operating at 60MHz and the power dissipation is only 34mW.

signal processing systems | 2008

Soft decoder architecture of LT codes

Kai Zhang; Xinming Huang; Chen Shen

Luby transform (LT) codes, as the first class of efficient rateless codes, attract a lot of attention in the coding theory field. However, the VLSI implementation of LT codes is challenging due to its random code construction characteristic as well as the flexible output length. In this paper, we present an applicable architecture of a soft-decision LT decoder with a block length of 1024 bits and 100 iterations. Partly parallel input node processing and output node processing techniques are both adopted to accelerate decoding speed. An efficient router and reverse router are designed to indicate the graphic connectivity between input nodes and output nodes. The parallel architecture is prototyped on the target FPGA device.

application specific systems architectures and processors | 2011

Design and implementation of a belief propagation detector for sparse channels

Yanjie Peng; Kai Zhang; Andrew G. Klein; Xinming Huang

In this paper, we address the design and implementation of the symbol detector for sparse channels which are described as having long spanning durations but sparse multipath structure. The traditional maximum-likelihood (ML) algorithm provides an optimal performance to eliminate the multipath effect, however its complexity scales exponentially with the channel length. As a more efficient symbol detection algorithm through sparse channels, the iterative belief propagation (BP) algorithm has a complexity merely dependent on the number of nonzero channel coefficients, while achieving a near-optimal error performance. We present the architecture design for a reconfigurable low-complexity high-throughput BP detector. As an example, we implement a BP detector for quadrature phase-shift keying (QPSK) modulation on Xilinx Virtex 5 FPGA with a maximum frequency of 252 MHz and equivalently a throughput of 100.8 Mb/s at 5 iterations.

military communications conference | 2010

Complexity and performance tradeoffs of near-optimal detectors for cooperative ISI channels

Yanjie Peng; Kai Zhang; Andrew G. Klein; Xinming Huang

Cooperative communication has attracted a lot of attention for its ability to exploit increased spatial diversity available at distributed antennas on other nodes in the system. For a cooperative system employing non-orthogonal amplify-and-forward half-duplex relays, the maximum likelihood sequence estimation (MLSE) is the optimal detector in intersymbol interference (ISI) channels. The implementation complexity of the optimal detector scales exponentially with the length of effective channel impulse response (CIR), however, which becomes very long and sparse as the relay period increases. In this paper, we focus on suboptimal detector design, complexity, and performance for cooperative relays in ISI channels. We first explore use of a decision feedback sequence estimator (DFSE). Next, to exploit the structured sparsity in the effective CIR, we consider an iterative belief propagation (BP) algorithm based detector. Using simulation results, we explore the tradeoff between complexity and performance.

asilomar conference on signals, systems and computers | 2009

Joint optimization of antenna orientation and spectrum allocation for cognitive radio networks

Wenxuan Guo; Xinming Huang; Kai Zhang

This paper presents a study on joint optimization of antenna orientation and spectrum allocation, with the objective of maximizing the total throughput of a cognitive radio network. The mathematical model is presented and subsequently formulated as a Mixed Integer Linear Programming problem. We obtain the optimal solution by adopting the branch and bound algorithm. Through simulation results, we investigate the impact of different parameters on the capacity performance of the cognitive radio network, including the transmit power and antenna directionality.

IEEE Transactions on Very Large Scale Integration Systems | 2013

Design and Implementation of a Low-Complexity Symbol Detector for Sparse Channels

Yanjie Peng; Xinming Huang; Andrew G. Klein; Kai Zhang

In this paper, we present a low-complexity symbol detector for communication channels which have long spanning durations but a sparse multipath structure. Traditional maximum-likelihood sequence estimation using the Viterbi algorithm can provide optimal error performance for eliminating the multipath effect, but the hardware complexity grows exponentially with channel length and it is not practical for long sparse channels. We implement a near-optimal algorithm and its architecture by cascading an adaptive partial response equalizer (PRE) with an iterative belief propagation (BP) detector. A sparse channel is first equalized by a PRE to a target impulse response (TIR) with only a few nonzero coefficients remaining. The residual intersymbol interference is then canceled by a BP detector whose complexity is solely dependent on the number of nonzero coefficients in the TIR. Moreover, we present a pipeline high-throughput implementation of the detector for channel length 30 with quadrature phase-shift keying modulation. The detector can achieve a maximum throughput of 206 Mb/s with an estimated core area of 3.162 mm2 using 90-nm technology node. At a target frequency of 515 MHz, the dynamic power is about 1.096 W.

Explore More