Chenxin Zhang | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Chenxin Zhang is active.

Explore More

Publication

Featured researches published by Chenxin Zhang.

IEEE Transactions on Circuits and Systems | 2015

Energy Efficient Group-Sort QRD Processor With On-Line Update for MIMO Channel Pre-Processing

Chenxin Zhang; Hemanth Prabhu; Yangxurui Liu; Liang Liu; Ove Edfors; Viktor Öwall

This paper presents a Sorted QR-Decomposition (SQRD) processor for 3GPP LTE-A system. It achieves energy-efficiency by co-optimizing techniques, such as heterogeneous processing, reconfigurable architecture, and dual-supply voltage operation. At algorithm level, a low-complexity hybrid decomposition scheme is adopted, which switches, depending on the energy distribution of spatial channels, between the traditional brute-force SQRD and a proposed group-sort QR-update strategy. A reconfigurable vector processor is accordingly developed to support the adaptive processing with high hardware efficiency. Furthermore, on-chip power management technique is also integrated to obtain real-time power-saving by adapting the voltage supply based on the instantaneous workload. As a proof-of-concept, we implemented the processor using a 65 nm CMOS technology and conducted post-layout simulation. The proposed SQRD processor occupies 0.71 mm2 core area and has a throughput of up to 69 MQRD/s. Compared to the brute-force approach, an energy reduction of 10 ~ 61.8% is achieved.

international symposium on circuits and systems | 2012

Mapping channel estimation and MIMO detection in LTE-advanced on a reconfigurable cell array

Chenxin Zhang; Liang Liu; Viktor Öwall

This paper presents a flexible architecture suitable for performing both channel estimation and signal detection in a MIMO-OFDM downlink. Extensive hardware sharing between two tasks is achieved by algorithm and architecture co-design, where robust MMSE sliding window channel estimation and MMSE-based signal detection with symbol perturbation scheme are adopted. The proposed architecture is based on a coarse-grained reconfigurable cell array with fast context switching capabilities. High flexibility is provided by the architecture, which allows task-level resource sharing and dynamic adoption of different algorithms onto the same platform. Simulation and analysis results have confirmed the efficiency of the proposed design solution, where more than 75% hardware resources are reused between the adopted algorithms.

reconfigurable computing and fpgas | 2009

Design of Coarse-Grained Dynamically Reconfigurable Architecture for DSP Applications

Chenxin Zhang; Thomas Lenart; Henrik Svensson; Viktor Öwall

This paper presents the design and implementation of a coarse-grained reconfigurable architecture, targeting digital signal processing applications. The proposed architecture is constructed from a mesh of resource cells, containing separated processing and memory elements that communicate via a hybrid interconnect network. Parameterizable design of resource cells enables flexible mapping of arbitrary applications at system compile-time, and the feature of dynamic reconfigurability provides mapping possibilities during system run-time to adapt to the current operational and processing conditions. Functionality and flexibility of the proposed architecture is demonstrated through mapping of a radix-22 FFT processor reconfigurable between 32 and 1024 points. Performance evaluation exhibits a great reconfigurability and execution time reduction when compared to a traditional DSP and ARM solution.

IEEE Embedded Systems Letters | 2014

A Square-Root-Free Matrix Decomposition Method for Energy-Efficient Least Square Computation on Embedded Systems

Fengbo Ren; Chenxin Zhang; Liang Liu; Wenyao Xu; Viktor Öwall; Dejan Markovic

QR decomposition (QRD) is used to solve least-squares (LS) problems for a wide range of applications. However, traditional QR decomposition methods, such as Gram-Schmidt (GS), require high computational complexity and nonlinear operations to achieve high throughput, limiting their usage on resource-limited platforms. To enable efficient LS computation on embedded systems for real-time applications, this paper presents an alternative decomposition method, called QDRD, which relaxes system requirements while maintaining the same level of performance. Specifically, QDRD eliminates both the square-root operations in the normalization step and the divisions in the subsequent backward substitution. Simulation results show that the accuracy and reliability of factorization matrices can be significantly improved by QDRD, especially when executed on precision-limited platforms. Furthermore, benchmarking results on an embedded platform show that QDRD provides constantly better energy-efficiency and higher throughput than GS-QRD in solving LS problems. Up to 4 and 6.5 times improvement in energy-efficiency and throughput, respectively, can be achieved for small-size problems.

norchip | 2012

Energy efficient MIMO channel pre-processor using a low complexity on-line update scheme

Chenxin Zhang; Hemanth Prabhu; Liang Liu; Ove Edfors; Viktor Öwall

This paper presents a low-complexity energy efficient channel pre-processing update scheme, targeting the emerging 3GPP long term evolution advanced (LTE-A) downlink. Upon channel matrix renewals, the number of explicit QR decompositions (QRD) and channel matrix inversions are reduced since only the upper triangular matrices R and R-1 are updated, based on an on-line update decision mechanism. The proposed channel pre-processing updater has been designed as a dedicated unit in a 65 nm CMOS technology, resulting in a core area of 0.242mm2 (equivalent gate count of 116K). Running at a 330MHz clock, each QRD or R-1 update consumes 4 or 2 times less energy compared to one exact state-of-the-art QRD in open literature.

international symposium on circuits and systems | 2011

Reconfigurable cell array for concurrent support of multiple radio standards by flexible mapping

Chenxin Zhang; Isael Diaz; Per Andersson; Joachim Neves Rodrigues; Viktor Öwall

This paper presents a flexible architecture suitable for concurrent processing of multiple radio standards. The proposed architecture is based on a coarse-grained reconfigurable cell array, consisting of distinct processing and memory cells. Flexibility of the architecture is demonstrated by performing a coarse time synchronization and fractional frequency offset estimation for multiple OFDM standards. The radio standards under analysis are IEEE 802.11n, LTE, and DVB-H. The reconfigurable cell array, containing 2-by-2 cells, is capable of processing two concurrent data streams from the standards. Dynamic reconfigurability of the architecture enables run-time switching between the standards. The implemented 2-by-2 cell array is synthesized using a 65 nm low-leakage standard cell CMOS library, resulting in an area of 0.479mm2 and a maximum clock frequency of 534MHz. High flexibility offered by the reconfigurable cell array allows the adoption of different algorithms onto the same platform.

Microprocessors and Microsystems | 2015

A low-latency high-throughput soft-output signal detector for spatial multiplexing MIMO systems

Stefan Granlund; Liang Liu; Chenxin Zhang; Viktor Öwall

This paper presents a low latency, high throughput soft-output signal detector for a 4×4 64-QAM spatial-multiplexing MIMO system. To achieve high data-level parallelism and accurate soft information, the detector adopts a channel-adaptive node perturbation technique to generate a list of candidate vectors around an initial linear estimation. The detection algorithm provides a large range and convenient performance-complexity trade-off by adjusting the node perturbation parameter. A partial-parallel pipelined VLSI architecture is developed to implement the algorithm with high throughput, low processing latency, while offering the flexibility to support run-time performance tuning. Moreover, a fast and hardware-friendly node enumeration scheme is developed to further reduce the processing delay by exploiting the geometric property of the quadrature amplitude modulation (QAM) constellation. The detector was synthesized using Synopsys Design Compiler with a 65nm CMOS standard cell library. The core area is 0.58mm2 with 290K gates. The peak throughput is 3Gb/s at 500MHz clock frequency with a latency of 20ns. Compared to other reported soft-output MIMO detectors, this is a latency reduction of 71%. The corresponding energy consumption is 33pJ per bit detection.

wireless communications and networking conference | 2013

A highly parallelized MIMO detector for vector-based reconfigurable architectures

Chenxin Zhang; Liang Liu; Yian Wang; Meifang Zhu; Ove Edfors; Viktor Öwall

This paper presents a highly parallelized MIMO signal detection algorithm targeting vector-based reconfigurable architectures. The detector achieves high data-level parallelism and near-ML performance by adopting a vector-architecture-friendly technique - parallel node perturbation. To further reduce the computational complexity, imbalanced node and successive partial node expansion schemes in conjunction with sorted QR decomposition are applied. The effectiveness of the proposed algorithm is evaluated by simulations performed on a simplified 4×4 MIMO LTE-A testbed and operation analysis. Compared to the K-Best detector and fixed-complexity sphere decoder (FSD), the number of visited nodes in the proposed algorithm is reduced by 15 and 1.9 times respectively, with less than 1dB performance degradation. Benefiting from the fully deterministic non-iterative dataflow structure, reconfiguration rate is 95% less than that of the K-Best detector and 17% less than the case of FSD.

Archive | 2016

Heterogeneous Reconfigurable Processors for Real-Time Baseband Processing: From Algorithm to Architecture

Chenxin Zhang; Liang Liu; Viktor Öwall

This book focuses on domain-specific heterogeneous reconfigurable architectures, demonstrating for readers a computing platform which is flexible enough to support multiple standards, multiple modes, and multiple algorithms. The content is multi-disciplinary, covering areas of wireless communication, computing architecture, and circuit design. The platform described provides real-time processing capability with reasonable implementation cost, achieving balanced trade-offs among flexibility, performance, and hardware costs. The authors discuss efficient design methods for wireless communication processing platforms, from both an algorithm and architecture design perspective. Coverage also includes computing platforms for different wireless technologies and standards, including MIMO, OFDM, Massive MIMO, DVB, WLAN, LTE/LTE-A, and 5G. •Discusses reconfigurable architectures, including hardware building blocks such as processing elements, memory sub-systems, Network-on-Chip (NoC), and dynamic hardware reconfiguration; •Describes a unique design and optimization methodology, applied to different areas and levels, including communication theory, hardware implementation, and software support; •Demonstrates design trade-offs during different development phases and enables readers to apply similar techniques to various applications.

international symposium on circuits and systems | 2014

Energy efficient SQRD processor for LTE-A using a group-sort update scheme

Chenxin Zhang; Hemanth Prabhu; Liang Liu; Ove Edfors; Viktor Öwall

This paper presents an energy-efficient sorted QR-decomposition (SQRD) processor for 3GPP LTE-Advanced (LTE-A) systems. The processor adopts a hybrid decomposition scheme to reduce computational complexity and provides a wide-range of performance-complexity trade-offs. Based on the energy distribution of spatial channels, it switches between the brute-force SQRD and a low-complexity group-sort QR-update strategy, which is proposed in this work to effectively utilize the LTE-A pilot pattern. As a proof of concept, a run-time reconfigurable vector processor is developed to efficiently implement this adaptive-switching QR decomposition algorithm. In a 65 nm CMOS technology, the proposed SQRD processor occupies 0.71mm2 core area and has a throughput of up to 100MQRD/s. Compared to the brute-force approach, an energy reduction of 5 ~ 33% is achieved.

Explore More