Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Hemanth Prabhu is active.

Publication


Featured researches published by Hemanth Prabhu.


wireless communications and networking conference | 2013

Approximative matrix inverse computations for very-large MIMO and applications to linear pre-coding systems

Hemanth Prabhu; Joachim Neves Rodrigues; Ove Edfors; Fredrik Rusek

In very-large multiple-input multiple-output (MIMO) systems, the base station (BS) is equipped with very large number of antennas as compared to previously considered systems. There are various advantages of increasing the number of antennas, and some schemes require handling large matrices for joint processing (pre-coding) at the BS. The dirty paper coding (DPC) is an optimal pre-coding scheme and has a very high complexity. However, with increasing number of BS antennas, linear pre-coding performance tends to that of the optimal DPC. Although linear pre-coding is less complex than DPC, there is a need to compute pseudo inverses of large matrices. In this paper we present a low complexity approximation of down-link Zero Forcing (ZF) linear pre-coding for very-large multi-user MIMO systems. Approximation using a Neumann series expansion is opted for inversion of matrices over traditional exact computations, by making use of special properties of the matrices, thereby reducing the cost of hardware. With this approximation of linear pre-coding, we can significantly reduce the computational complexity for large enough systems, i.e., where we have enough BS antenna elements. For the investigated case of 8 users, we obtain 90% of the full ZF sum rate, with lower computational complexity, when the number of BS antennas per user is about 20 or more.


international symposium on circuits and systems | 2014

Hardware efficient approximative matrix inversion for linear pre-coding in massive MIMO

Hemanth Prabhu; Ove Edfors; Joachim Neves Rodrigues; Liang Liu; Fredrik Rusek

This paper describes a hardware efficient linear precoder for Massive MIMO Base Stations (BSs) comprising a very large number of antennas, say, in the order of 100s, serving multiple users simultaneously. To avoid hardware demanding direct matrix inversions required for the Zero-Forcing (ZF) precoder, we use low complexity Neumann series based approximations. Furthermore, we propose a method to speed-up the convergence of the Neumann series by using tri-diagonal precondition matrices, which lowers the complexity even further. As a proof of concept a flexible VLSI architecture is presented with an implementation supporting matrix inversion of sizes up-to 16×16. In 65 nm CMOS, a throughput of 0.5M matrix inversions per sec is achieved at clock frequency of 420MHz with a 104K gate count.


international symposium on communications, control and signal processing | 2014

A low-complex peak-to-average power reduction scheme for OFDM based massive MIMO systems

Hemanth Prabhu; Ove Edfors; Joachim Neves Rodrigues; Liang Liu; Fredrik Rusek

An Orthogonal Frequency-Division Multiplexing (OFDM) based multi-user massive Multiple-Input Multiple-Output (MIMO) system is considered. The problem of high Peak-to-Average Ratio (PAR) in OFDM based systems is well known and the large number of antennas (RF-chains) at the Base Station (BS) in massive MIMO systems aggravates this further, since large numbers of these Power Amplifiers (PAs) are used. High PAR necessitates linear PAs, which have a high hardware cost and are typically power inefficient. In this paper we propose a low-complex approach to tackle the issue. The idea is to deliberately clip signals sent to one set of antennas, while compensating for this by transmitting correction signals on a set of reserved antennas (antenna-reservation). A reduction of 4dB in PAR is achieved by reserving 25% of antennas, with only a 15% complexity overhead.


IEEE Transactions on Circuits and Systems | 2015

Energy Efficient Group-Sort QRD Processor With On-Line Update for MIMO Channel Pre-Processing

Chenxin Zhang; Hemanth Prabhu; Yangxurui Liu; Liang Liu; Ove Edfors; Viktor Öwall

This paper presents a Sorted QR-Decomposition (SQRD) processor for 3GPP LTE-A system. It achieves energy-efficiency by co-optimizing techniques, such as heterogeneous processing, reconfigurable architecture, and dual-supply voltage operation. At algorithm level, a low-complexity hybrid decomposition scheme is adopted, which switches, depending on the energy distribution of spatial channels, between the traditional brute-force SQRD and a proposed group-sort QR-update strategy. A reconfigurable vector processor is accordingly developed to support the adaptive processing with high hardware efficiency. Furthermore, on-chip power management technique is also integrated to obtain real-time power-saving by adapting the voltage supply based on the instantaneous workload. As a proof-of-concept, we implemented the processor using a 65 nm CMOS technology and conducted post-layout simulation. The proposed SQRD processor occupies 0.71 mm2 core area and has a throughput of up to 69 MQRD/s. Compared to the brute-force approach, an energy reduction of 10 ~ 61.8% is achieved.


international solid-state circuits conference | 2017

3.6 A 60pJ/b 300Mb/s 128×8 Massive MIMO precoder-detector in 28nm FD-SOI

Hemanth Prabhu; Joachim Neves Rodrigues; Liang Liu; Ove Edfors

Further exploitation of the spatial domain, as in Massive MIMO (MaMi) systems, is imperative to meet future communication requirements [1]. Up-scaling of conventional 4×4 small-scale MIMO implementations to MaMi is prohibitive in-terms of flexibility, as well as area and power cost. This work discloses a 1.1mm2 128×8 MaMi baseband chip, achieving up to 12dB array and 2× spatial multiplexing gains. The area cost compared to previous state-of-the-art MIMO implementations [2–3], is reduced by 53% and 17% for up- and down-link, respectively. Algorithm optimizations and a highly flexible framework were evaluated on real measured channels. Extensive hardware time multiplexing lowered area cost, and leveraging on flexible FD-SOI body bias and clock gating resulted in an energy efficiency of 6.56nJ/QRD and 60pJ/b at 300Mb/s detection rate.


norchip | 2012

Energy efficient MIMO channel pre-processor using a low complexity on-line update scheme

Chenxin Zhang; Hemanth Prabhu; Liang Liu; Ove Edfors; Viktor Öwall

This paper presents a low-complexity energy efficient channel pre-processing update scheme, targeting the emerging 3GPP long term evolution advanced (LTE-A) downlink. Upon channel matrix renewals, the number of explicit QR decompositions (QRD) and channel matrix inversions are reduced since only the upper triangular matrices R and R-1 are updated, based on an on-line update decision mechanism. The proposed channel pre-processing updater has been designed as a dedicated unit in a 65 nm CMOS technology, resulting in a core area of 0.242mm2 (equivalent gate count of 116K). Running at a 330MHz clock, each QRD or R-1 update consumes 4 or 2 times less energy compared to one exact state-of-the-art QRD in open literature.


asilomar conference on signals, systems and computers | 2015

Algorithm and hardware aspects of pre-coding in massive MIMO systems

Hemanth Prabhu; Joachim Neves Rodrigues; Liang Liu; Ove Edfors

Massive Multiple-Input Multiple-Output (MIMO) systems have been shown to improve both spectral and energy efficiency one or more orders of magnitude by efficiently exploiting the spatial domain. Low-cost RF chains can be employed to reduce the Base Station (BS) cost, however this may require additional baseband processing to handle induced distortions due to the hardware impairments. In this article the reduction of Peak-to-Average power Ratio (PAR) of the transmitted signals and IQ imbalance in the mixer are analyzed for the down-link. We analyze various pre-coding schemes and estimate the required processing energy per transmitted information bit. Simulation on gate-level show that the energy cost of performing pre-coding and tackling of hardware impairments are low, in the order of few pJ per bit.


norchip | 2011

A GALS ASIC implementation from a CAL dataflow description

Hemanth Prabhu; Sherine Thomas; Joachim Neves Rodrigues; Thomas Olsson; Anders Carlsson

This paper presents low power hardware generation, based on a CAL actor language dataflow implementation. The CAL language gives a higher level of abstraction and generate both hardware and software description. The original CAL flow is targeted for hardware-software co-design of complex systems on FPGA. Modifications are done to the original CAL flow to facilitate low power ASIC implementations. The hardware-software co-design and Globally Asynchronous Locally Synchronous (GALS) design at a higher level of abstraction provides more freedom for design-space exploration and reduced design time. Performance is evaluated by a reference design, Orthogonal Frequency-Division Multiplexing (OFDM) multi-standard channel estimator based on robust Minimum Mean-Square Error (MMSE) algorithm. Higher throughput is attained due to inherent parallelism in CAL dataflow and reduced design time for GALS implementation.


international symposium on circuits and systems | 2015

High throughput constant envelope pre-coder for massive MIMO systems

Hemanth Prabhu; Fredrik Rusek; Joachim Neves Rodrigues; Ove Edfors

This study describes a high throughput constant envelope (CE) pre-coder for Massive MIMO systems. A large number of antennas (M), in the order of 100s, serve a relatively small number of users (K) simultaneously. The stringent amplitude constraint (only phase changes) in the CE scheme is motivated by the use of highly power-efficient non-linear RF power amplifiers. We propose a scheme that computes the CE signals to be transmitted based on box-constrained regression (coordinate-descent), with an O(2MK) complexity per iteration per user symbol. A highly scalable systolic architecture is implemented, where M Processing Elements (PEs) perform the pre-coding for a system with up to K=16 users. This systolic architecture results in a very high throughput of 500 Msamples/sec (at 500 MHz clock rate) with a gate count of 14K per PE in 65nm technology.


international symposium on circuits and systems | 2017

A Cholesky decomposition based massive MIMO uplink detector with adaptive interpolation

Hemanth Prabhu; Ove Edfors; Liang Liu

An adaptive uplink detection scheme for a Massive MIMO (MaMi) base station serving up to 16 users is presented. Considering user distribution in a cell, selective matched filtering (MF) is proposed for non-interference limited users and a Cholesky decomposition (CD) based zero-forcing (ZF) detector is implemented for the remaining users. Channel conditions such as coherence bandwidth are exploited to lower computational complexity by interpolating CD outputs. Performance evaluations on measured MaMi channels indicate a reduction in computation count by 60 times with a less than 1 dB loss at an uncoded bit error rate of 10−3. For the CD, a reconfigurable processor optimized for 8×8 matrices with block decomposition extension to support up to 16×16 matrices is presented. Circuit level optimizations in 28 nm FD-SOI resulted in an energy of 1.4 nJ/CD at 400 MHz, and post-layout simulations indicate a 50% reduction in power dissipation when operating with the proposed interpolation based detection scheme compared to traditional ZF detection.

Collaboration


Dive into the Hemanth Prabhu's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge