Honey Durga Tiwari | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Honey Durga Tiwari is active.

Explore More

Publication

Featured researches published by Honey Durga Tiwari.

international soc design conference | 2008

Multiplier design based on ancient Indian Vedic Mathematics

Honey Durga Tiwari; Ganzorig Gankhuyag; Chan Mo Kim; Yong Beom Cho

Vedic mathematics is the name given to the ancient Indian system of mathematics that was rediscovered in the early twentieth century from ancient Indian sculptures (Vedas). It mainly deals with Vedic mathematical formulae and their application to various branches of mathematics. The algorithms based on conventional mathematics can be simplified and even optimized by the use of Vedic Sutras. These methods and ideas can be directly applied to trigonometry, plain and spherical geometry, conics, calculus (both differential and integral), and applied mathematics of various kinds. In this paper new multiplier and square architecture is proposed based on algorithm of ancient Indian Vedic Mathematics, for low power and high speed applications. It is based on generating all partial products and their sums in one step. The design implementation on ALTERA Cyclone -II FPGA shows that the proposed Vedic multiplier and square are faster than array multiplier and Booth multiplier.

IEEE Transactions on Consumer Electronics | 2011

Flexible LDPC decoder using stream data processing for 802.11n and 802.16e

Honey Durga Tiwari; Huynh Ngoc Bao; Yong Beom Cho

Wireless data transmission standards like 802.16e, 802.11n, employ Low Density parity Check (LDPC) codes for error control coding. The bit flipping decoding algorithms presents a tradeoff between the error correcting capability, decoding resources and the decoding time. Software based LDPC decoders provide adaptation capabilities in system parameters such as block size and code rate. In a real-time, low-power mobile environments, the Single-Instruction Multiple-Data (SIMD) processor currently used for video processing, could also be used for the LDPC decoding. In this paper, the implementation efficient, reliability ratio-based, weighted bit flipping (IRRWBF) algorithm is presented using a flexible software based LDPC decoder. Compact data structures are proposed for performing the decoding using SIMD architecture. Based on the implementation on two commonly used SIMD architecture for mobile platform, it was found that the decoding speed can be increased by more than 2000% (using 64 bit SIMD registers with vector integer calculation) and 1800% (using 128 bit SIMD registers with vector floating point calculation). Experimental results for different code lengths of 802.16e and 802.11n show that decoding time in order of 1×10-3 ~10×10-3 seconds is achievable. Due to significantly high throughput and flexibility, the proposed design algorithm and data structure can easily be adapted to any energy-sensitive mobile devices employing SIMD processors1.

international soc design conference | 2008

Implementation of DCT based OFDM system

Gi Hyun Kim; Honey Durga Tiwari; Chan Mo Kim; Yong Beom Cho; Younggoo Kwon

The conventional OFDM system employs the IFFT-FFT structure to impart the orthogonolity feature. However, due to complex nature of FFT, implementation size is large enough. The orthogonolity can also be provided if IDCT-DCT structure is used in place of FFT. This will reduce the implementation area and will also increase the computation speed as only real calculations are required. In this paper we present the implementation of DCT based OFDM system. The DCT structure prevalent in H.264 standard is taken into reference as it has faster operation as compared to conventional DCT structure. The implementation of DCT-IDCT structure was done on ALTERA CYCLONE -II FPGA. The results show that the speed of calculation of orthogonal components is increased three folds while the implementation size reduces to half as compared to FFT based design.

International Journal of Electronics | 2013

Low cost high throughput pipelined architecture of 2-D 8 × 8 integer transforms for H.264/AVC

Meeturani Sharma; Honey Durga Tiwari; Yong Beom Cho

In this article, we present the implementation of high throughput two-dimensional (2-D) 8 × 8 forward and inverse integer DCT transform for H.264. Using matrix decomposition and matrix operation, such as the Kronecker product and direct sum, the forward and inverse integer transform can be represented using simple addition operations. The dual clocked pipelined structure of the proposed implementation uses non-floating point adders and does not require any transpose memory. Hardware synthesis shows that the maximum operating frequency of the proposed pipelined architecture is 1.31 GHz, which achieves 21.05 Gpixels/s throughput rate with the hardware cost of 42932 gates. High throughput and low hardware makes the proposed design useful for real time H.264/AVC high definition processing.

IEEE Transactions on Parallel and Distributed Systems | 2012

A Parallel IRRWBF LDPC Decoder Based on Stream-Based Processor

Honey Durga Tiwari; Huynh Ngoc Bao; Yong Beom Cho

Low-density parity check (LDPC) codes have gained much attention due to their use of various belief-propagation (BP) decoding algorithms to impart excellent error-correcting capability. The BP decoders are quite simple; however, their computation-intensive and repetitive process prohibits their use in energy-sensitive applications such as sensor networks. Bit flipping-based decoding algorithms, especially implementation-efficient, reliability ratio-based, weighted bit-flipping (IRRWBF) decoding; have shown an excellent tradeoff between error-correction performance and implementation cost. In this paper, we show that with IRRWBF, iterative re-computation can be replaced by iterative selective updating. When compared with the original algorithm, simulation results show that, decoding speed can be increased by 200 to 600 percent , as the number of decoding iterations is increased from 5 to 1,000. The decoding steps are broken down into various stages such that the update operations are mostly of the single-instruction, multiple-data (SIMD) type. In this paper, we show that by using Intel Wireless MMX 2 accelerating technology in the proposed algorithm, the speed increased by 500 to 1,500 percent. The results of implementing the proposed scheme using an Intel/Marvell PXA320 (806 MHz) CPU are presented. The proposed scheme can be used effectively in real-time LDPC codes for energy-sensitive mobile devices.

international soc design conference | 2008

Multiplier less fast loss less integer DCT for H.264

Honey Durga Tiwari; Ganzorig Gankhuyag; Gi Hyun Kim; Chan Mo Kim; Yong Beom Cho

In the paper we propose a 4 times 4 2-D DCT transpose architecture for use in H.264 video coding standard. Using matrix decomposition the entire 2-D DCT architecture can be made parallel in nature such that the resulting circuit is purely combinational. The DCT values can then be computed in almost the one clock cycle. As the computation clock is independent of the data clock. The actual maximum operating frequency and the throughput of the design can be much higher than the data input rate. The reversible nature of the architecture helps to use the design for IDCT calculation without the change of the design. The FPGA implementation of the proposed design shows that the design throughput of 4.76 Gbps and maximum operating frequency of around 37.24 MHz can be achieved.

Digital Signal Processing | 2016

High throughput resource shared 2D integer transform computation for H.264/MPEG-4 AVC

Honey Durga Tiwari; Meeturani Sharma; Harsh Durga Tiwari

Two dimensional (2D) integer transforms are used in all the profiles of H.264/MPEG-4 Advanced Video Coding (AVC) standard. This paper presents a resource shared architecture of 2D 4 × 4 and 8 × 8 integer transforms in H.264/MPEG-4 AVC coders. Existing architectures use separate designs to compute 2D 4 × 4 and 8 × 8 forward/inverse integer transform. Shared resource architectures for 4 × 4 and 8 × 8 transforms can be used to reduce the implementation area without sacrificing high throughput. Matrix decomposition is used to show that the 2D 4 × 4 forward/inverse integer transform of two independent data blocks can be obtained from one 2D 8 × 8 forward/inverse integer transform circuit. A high throughput architecture is used as the base design for the implementation of 2D 8 × 8 forward/inverse transform. Data rearrangement stages are added to the base design to compute the 2D 4 × 4 forward/inverse transform. The proposed dual-clock pipelined architecture does not require any transpose memory. As compared to existing designs, the proposed design operates on two independent 4 × 4 sub-blocks. Hence, the overall throughput of the 2D 4 × 4 forward/inverse transform computation increases by approximately 200% with less than a 5% increase in the gate count. The proposed design operates at a clock frequency of approximately 1.25 GHz and achieves a throughput of 7 G and 18.7 G pixels/sec for each block of 4 × 4 and 8 × 8 forward integer transforms, respectively. Due to resource shared implementation and high throughput, the proposed design can be used for real-time H.264/MPEG-4 AVC processing. Parallel processing of two independent blocks of 4 × 4 data.High circuit utilization ratio during 2D 8 × 8 and 4 × 4 integer transform computation.Data stream processing capability.High throughput.Small gate count.

Digital Signal Processing | 2012

Message length adaptive LDPC codes

Honey Durga Tiwari; Yong Beom Cho

LDPC codes achieve better performance and lower decoding complexity than turbo codes, with a major drawback of high encoding complexity. The encoder generator matrix is derived from the inverse of portion of parity check matrix. If the message length is changed, the structure of parity check matrix is modified and hence, the generator matrix must be re-computed. This increases the encoding complexity as the computation of matrix inverse is time and resource consuming operation. In this paper, we consider the encoding problem for LDPC codes as the complexity of encoding is essentially quadratic with respect to the block length. Using an efficient encoding method proposed by Richardson and Urbanke, we propose a systematic procedure to construct parity check matrix and generator matrix such that with change in message length, the re-computation for constructing generator matrix is avoided. The presented design uses fixed sub-matrices to construct a semi-random parity check matrix. The resultant design will reduce the pre-computation time of converting parity check matrix to generator matrix. The reported encoder reduces encoding time without the loss of coding gain and Bit Error Rate (BER) performance.

Aeu-international Journal of Electronics and Communications | 2012