Is this you? Create Your Porfile

Rahul Shrestha

Indian Institute of Technology Guwahati

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Rahul Shrestha is active.

Explore More

Publication

Featured researches published by Rahul Shrestha.

IEEE Transactions on Circuits and Systems | 2014

High-Throughput Turbo Decoder With Parallel Architecture for LTE Wireless Communication Standards

Rahul Shrestha; Roy Paily

This work focuses on the VLSI design aspect of high- speed maximum a posteriori (MAP) probability decoders which are intrinsic building-blocks of parallel turbo decoders. For the logarithmic-Bahl-Cocke-Jelinek-Raviv (LBCJR) algorithm used in MAP decoders, we have presented an ungrouped backward recursion technique for the computation of backward state metrics. Unlike the conventional decoder architectures, MAP decoder based on this technique can be extensively pipelined and retimed to achieve higher clock frequency. Additionally, the state metric normalization technique employed in the design of an add-compare-select-unit (ACSU) has reduced critical path delay of our decoder architecture. We have designed and implemented turbo decoders with 8 and 64 parallel MAP decoders in 90 nm CMOS technology. VLSI implementation of an 8 × parallel turbo-decoder has achieved a maximum throughput of 439 Mbps with 0.11 nJ/bit/iteration energy-efficiency. Similarly, 64 × parallel turbo-decoder has achieved a maximum throughput of 3.3 Gbps with an energy-efficiency of 0.079 nJ/bit/iteration. These high-throughput decoders meet peak data-rates of 3GPP-LTE and LTE-Advanced standards.

IEEE Transactions on Circuits and Systems | 2015

High-Throughput LDPC-Decoder Architecture Using Efficient Comparison Techniques & Dynamic Multi-Frame Processing Schedule

Sachin Kumawat; Rahul Shrestha; Nikunj Daga; Roy Paily

This paper presents architecture of block-level-parallel layered decoder for irregular LDPC code. It can be reconfigured to support various block lengths and code rates of IEEE 802.11n (WiFi) wireless-communication standard. We have proposed efficient comparison techniques for both column and row layered schedule and rejection-based high-speed circuits to compute the two minimum values from multiple inputs required for row layered processing of hardware-friendly min-sum decoding algorithm. The results show good speed with lower area as compared to state-of-the-art circuits. Additionally, this work proposes dynamic multi-frame processing schedule which efficiently utilizes the layered-LDPC decoding with minimum pipeline stages. The suggested LDPC-decoder architecture has been synthesized and post-layout simulated in 90 nm-CMOS process. This decoder occupies 5.19 mm2 area and supports multiple code rates like 1/2, 2/3, 3/4 & 5/6 as well as block-lengths of 648, 1296 & 1944. At a clock frequency of 336 MHz, the proposed LDPC-decoder has achieved better throughput of 5.13 Gbps and energy efficiency of 0.01 nJ/bits/iterations, as compared to the similar state-of-the-art works.

Iet Circuits Devices & Systems | 2016

Multi-standard high-throughput and low-power quasi-cyclic low density parity check decoder for worldwide interoperability for microwave access and wireless fidelity standards

Vijaya Kumar Kanchetla; Rahul Shrestha; Roy Paily

This study presents a reconfigurable quasi-cyclic low density parity check (QC-LDPC) decoder for IEEE 802.16e worldwide interoperability for microwave access and IEEE 802.11n wireless fidelity communication standards. It supports multiple code-rates of 1/2, 2/3, 3/4, 5/6 and its architecture has been designed based on column layered decoding technique to enhance the convergence speed. The authors have suggested a register file based approach to handle the shift property of the modified parity check matrix and a modified version of the matrix permutation method has been introduced to reduce the number of check nodes which handle multiple messages. In addition, parallel processing has been incorporated in the decoder architecture to attain higher achievable throughput. This QC-LDPC decoder is implemented in 90 nm CMOS process and is post-layout simulated. It can achieve a throughput of 796 Mbps for a code-rate of 5/6. With 0.9 V supply, it consumes 146 mW of total power at 149 MHz clock frequency.

international conference on vlsi design | 2013

Design and Implementation of a High Speed MAP Decoder Architecture for Turbo Decoding

Rahul Shrestha; Roy Paily

Maximum a posteriori probability (MAP) decoder is an integral part of the most exciting error correcting turbo decoders. A high speed architecture for MAP decoder is an essential entity for the design of high throughput turbo decoder which is widely used in the recent wireless communication standards. In this paper, a new sliding window approach for the Bahl-Cocke-Jelinek-Raviv (BCJR) algorithm used in the design of MAP decoder is presented. An architecture for MAP decoder based on this approach and its operation is also included in this paper. The proposed MAP decoder architecture is implemented on field programmable gate array (FPGA) and the results are discussed. The proposed MAP decoder operates at a maximum frequency of 346 MHz and is compared with the state of the art implementations of MAP decoder. Finally, the bit error rate (BER) performance of an implemented MAP decoder in a communication environment is measured.

international conference on communications | 2011

Hardware implementation of Max-Log-MAP algorithm based on MacLaurin series for turbo decoder

Rahul Shrestha; Roy Paily

After the initial interest caused by appearance of turbo-codes in 1993, special attention to the hardware implementation has led to many different modified algorithms for MAP decoders. The original MAP algorithm suffers from serious drawbacks in its hardware implementation. To overcome this disadvantage, Max-Log-MAP and Log-MAP algorithms have been proposed to reduce the complexity. Recently an improved Max-Log-MAP algorithm is proposed by Shahram et. al. based on MacLaurin series to further reduce the complexity. However there are no hardware implementation reported on this particular Max-Log-MAP algorithm based on MacLaurin series. In this paper, we have proposed hardware architecture for modified Max- Log-MAP algorithm using MacLaurin series. In addition, the performance of proposed architecture is improved by replacing all the multipliers with shifters and adders. This implementation is very useful for high data rate communication applications as the performance of this decoder in lossy ISI channel is very good. Finally the performance of proposed architecture is compared with hardware implementation of Max-Log-MAP SISO decoder.

international conference on green computing communication and electrical engineering | 2014

Design and implementation of multi-rate LDPC decoder for IEEE 802.16e wireless standard

K. Vijaya Kumar; Rahul Shrestha; Roy Paily

In this paper, a flexible architecture of multi-rate Low Density Parity Check (LDPC) decoder has been presented. It supports six different code-rates which are specified by IEEE 802.16e wireless standard. In the suggested decoder-architecture, column layered decoding technique has been employed to increase the convergence speed. Additionally, the decoder-design incorporates parallel architecture to achieve higher throughput which meets the requirement of IEEE 802.16e standard. An Application Specific Integrated Circuits (ASIC) implementation of this decoder-architecture has been performed at 130 nm Complementary Metal Oxide Semiconductor (CMOS) technology node. At the worst-case Process Voltage Temperature (PVT) corner with the supply voltage of 1.08 V, the implemented decoder has achieved a maximum information throughput of 159.6 Mbps at a clock frequency of 39.9 MHz.

vlsi design and test | 2012

Design and implementation of a linear feedback shift register interleaver for turbo decoding

Rahul Shrestha; Roy Paily

Recent wireless communication standards such as 3GPP-LTE, WiMax, DVB-SH and HSPA incorporates turbo code for its excellent coding performance. The interleavers involved in these turbo encoder and decoder play vital role in their performance. In this paper, we have proposed a linear feedback shift register (LFSR) based interleaver for turbo code. The proposed interleaver is compared with existing quadratic permutation polynomial (QPP) and almost regular permutation (ARP) interleavers. The investigation on the hardware implementation of these interleavers were carried out in terms of area and power consumption, and maximum frequency of operation. Hardware implementations were performed in Field Programmable Gate Array (FPGA), as well as in Application Specific Integrated Circuit (ASIC) using 130 nm complementary metal oxide semiconductor (CMOS) technology.

ieee india conference | 2016

Design of low power VLSI-architecture and ASIC implementation of fuzzy logic based automatic car-parking system

Enna Sachdeva; Pratik Porwal; Nalini Vidyulatha; Rahul Shrestha

In this work, the overall system design of the automatic car-parking application has been presented where its non-linear control estimation is based on the fuzzy logic control (FLC) model. Finite state machine (FSM) based central controller has been used to operate such FLC model. It performs a series of operations required for the real-time car-parking process. Subsequently, the defuzzifier architecture has been further optimized using resource sharing technique which reduces the overall chip area and power consumption of the proposed car parking system. Additionally, we have discussed the results from field-programmable gate-array (FPGA) synthesis and application-specific integrated-circuit (ASIC) implementations of the suggested architecture and compared with the conventional architecture. Functional verification of the design using simulation tool for a similar set of inputs has been performed in this work. Thus, the suggested architecture occupies an area of 46335 μm2 and consumes a total power of 0.06254 mW at 60 MHz, when synthesized and post-layout simulated in 180 nm complementary metal-oxide semiconductor (CMOS) technology node.

advances in computing and communications | 2013

A novel state metric normalization technique for high-throughput maximum-a-posteriori-probability decoder

Rahul Shrestha; Roy Paily

In this paper, a new state metric normalization technique is proposed for maximum-a-posteriori-probability (MAP) algorithm to enhance the throughput of MAP decoder. Bit-error-rate (BER) performance comparison showed that the MAP algorithm based on the proposed normalization technique has a coding gain of 0.25 dB at a BER of 10-4 in comparison with MAP algorithm based on the subtractive normalization technique. An architecture for MAP decoder based on the new normalization technique has been proposed with a reduced critical path delay as compared to the contributions in literature. Subsequently, a field-programmable-gate-array (FPGA) implementation of the new MAP decoder is carried out. Thereby, the proposed decoder based on non-parallel radix-2 and radix-4 architectures are able to achieve high throughputs of 514 Mbps and 1.028 Gbps respectively.

Iet Communications | 2013

Performance and throughput analysis of turbo decoder for the physical layer of digitalvideo-broadcasting-satellite-services-tohandhelds standard

Rahul Shrestha; Roy Paily

In this study, coding performance of turbo decoder compliant to the physical layer of digital-video-broadcasting-satellite-services-to-handhelds (DVB-SH) standard for additive-white-Gaussian-noise (AWGN) and frequency selective fading channels are presented. The modulation of transmitted bits is carried out with orthogonal-frequency-division-multiplexing (OFDM) technique, incorporating 1 K-fast-Fourier-transform (1K-FFI) where each subcarrier is modulated using quadrature-phase-shift-keying (QPSG) or quadrature-amplitude-modulation (QAM) schemes. Performance analysis of turbo decoder for the decoding iterations of 3, 8, 14 and 18 as well as the sliding window sizes of 10, 20, 30 and 40 are investigated for both the channels. Discussion on the values of these design metrics to achieve optimum coding performance is also presented. The optimisation of system throughput for turbo decoder based on the decoding iteration and sliding window size for various processor speed ranging from 200 MHz to 1 GHz is carried out. Such an analysis is presented for non-parallel radix-2 as well as parallel radix-4 configuration of turbo decoder to meet the system throughput specification of third-generation wireless communication standard ranging from 100 to 300 Mbps. The coding performance of turbo decoder based on max-log-MAP, log-MAP and Maclaurin series-based algorithms are studied for both the channel conditions. Simultaneously, the running time for each of these algorithm in a 64 bit processor is also presented for comparison. Finally, the coding performance of turbo decoder for various code rates of 1/5, 2/9, 1/4, 2/7, 1/3, 2/5 and 1/2 are carried out.

Explore More