Is this you? Create Your Porfile

S. Gurunarayanan

Birla Institute of Technology and Science

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where S. Gurunarayanan is active.

Explore More

Publication

Featured researches published by S. Gurunarayanan.

international conference on advanced computing | 2006

SOC design of a Low Power Wireless Sensor network node for Zigbee Systems

Ninad B. Kothari; T. S. B. Sudarshan; S. Gurunarayanan

We present a SoC architecture of a wireless sensor network node for the zigbee protocol which can both serve as an independent chip or also be incorporated as a component of a larger system. We describe the attributes that a design must possess in order to function effectively in a densely populated sensor network and then discuss the proposed architecture design adhering to these attributes. We draw comparisons between proposed architecture and some already existing ones and finally highlight the advantages of using the proposed SoC design.

international conference on signal processing | 2008

Predictive Placement Scheme In Set-Associative Cache For Energy Efficient Embedded Systems

Biju K. Raveendran; T. S. B. Sudarshan; Avinash Patil; Komal Randive; S. Gurunarayanan

This paper proposes a predictive placement scheme for set-associative cache with better way-prediction hit, energy efficiency and performance. In this work, we consider the data cache subsystem, as it is one of the most power consuming micro-architectural parts of an embedded system. We propose an energy efficient way-prediction scheme with predictive placement to improve prediction hit using minimal prediction bits. We show that, this scheme has an average energy saving 67.75% as compared to conventional caching scheme. Experimental results are obtained using Simplescalar 2.0 cache simulator for SPEC95 benchmarks.

international conference on vlsi design | 2015

An FPGA-Based Architecture for Local Similarity Measure for Image/Video Processing Applications

Jai Gopal Pandey; Arindam Karmakar; Chandra Shekhar; S. Gurunarayanan

Similarity measures are used in diverse signal-processing applications. Bhattacharyya coefficient is one of the most popular similarity measures that is widely used in many image/video processing applications. Several of these applications need to compute similarity measure between probability density functions of local image statistics. In this paper, an efficient hardware architecture is proposed for accelerating the local similarity measure (LSM) computation using Bhattacharyya coefficient. Direct hardware implementation of Bhattacharyya coefficient requires many compute-intensive hardware resources, which slow down the overall computation process. Data path of the proposed architecture utilizes fixed-point arithmetic and is based on the logarithmic number system. Fast binary logarithmic and antilogarithmic computing units are deployed to realize the required complex arithmetic operations. The histogram computation is accomplished using single-cycle read-modify-write operations on the received image data stored in DDR2 SDRAM. The proposed architecture is realized in the Virtex-5 xc5vfx70t FPGA device of Xilinx ML-507 platform. The device utilization of the implemented architecture shows that it utilizes 4.5% FPGA slices, 5.4% Block RAMs and 27.34% DSP48E slices.

acm symposium on applied computing | 2008

Evaluation of priority based real time scheduling algorithms: choices and tradeoffs

Biju K. Raveendran; Sundar Balasubramaniam; S. Gurunarayanan

Real time scheduling algorithms like RM and EDF have been analyzed extensively in the literature. Many recent works on scheduling address energy consumption as a performance metric. In this work we analyze priority scheduling algorithms RM, EDF, and LLF along with a few power-aware scheduling algorithms: MLLF, RM_RCS and EDF_RCS. Our analysis addresses the following metrics: response time, response time jitter, latency, time complexity, preemptions, and energy consumption. We extend past work in this direction by characterizing the performance of the scheduling algorithms -- theoretically as well as experimentally. Results of our analysis can be used to control design choices for real time systems.

international conference on vlsi design | 2014

A Novel Architecture for FPGA Implementation of Otsu's Global Automatic Image Thresholding Algorithm

Jai Gopal Pandey; Arindam Karmakar; Chandra Shekhar; S. Gurunarayanan

Otsus global automatic image thresholding technique is widely used in various computer vision-based applications. This paper presents a resource-efficient architecture for the design of Otsus thresholding algorithm and its implementation in field-programmable gate array (FPGA). The proposed architecture is implemented for a 640x480 size of input image that is captured by a real-time high-resolution analog camera and buffered in a DDR2 SDRAM memory. The computation of between-class variance in Otsus algorithm requires the evaluation of a normalized cumulative histogram, mean and cumulative moments, which need single-cycle read-modify-write operations. These operations are achieved by incorporating the FPGAs slices, dual-port Block RAM memories and DSP slices with DDR2 SDRAM as a frame buffer. The data path of the architecture is fixed-point arithmetic based and it does not require any divider. The proposed design is implemented in Xilinx Virtex-5 xc5vfx70tffg1136-1 FPGA device, available on the Xilinx ML-507 platform. In order to develop the required hardware and software in an integrated method, the Xilinx Embedded Development Kit (EDK) design tool is used.

ieee international conference on image information processing | 2013

An FPGA-based fixed-point architecture for binary logarithmic computation

J. G. Pandey; A. Karmakar; Chandra Shekhar; S. Gurunarayanan

Real-time numerically intensive image processing applications demand dedicated hardware for various complex arithmetic functions. These arithmetic functions can be efficiently implemented by employing a binary logarithmic circuit. In this paper a field-programmable gate array (FPGA) based architecture for the binary logarithm approximation unit is proposed. The proposed architecture utilizes combinational logic circuit elements and fixed-point datapath. The implemented architecture is capable of finding approximated logarithm of an integer number, integer with fractional number and only fractional number. The architecture uses the same set of circuit elements for all computations. In the implemented architecture eight-region approximations is used. The proposed architecture is implemented in a Xilinx Virtex-5 xc5vfx70t FPGA device. The available FPGA macros are utilized for the elementary circuit elements. The device utilization summery shows that the proposed architecture consumes minimal FPGA resources. The error analysis, performed with multiple sets of random numbers, illustrates that the proposed architecture has very nominal error associated with both the fractional as well as fixed-point numbers.

International Conference on Eco-friendly Computing and Communication Systems | 2012

Compiler Efficient and Power Aware Instruction Level Parallelism for Multicore Architecture

D.C. Kiran; S. Gurunarayanan; Faizan Khaliq; Abhijeet Nawal

The paradigm shift to multicore processors for better performance has added a new dimension for research in compiler driven instruction level parallelism. The work in this paper proposes an algorithm to group dependent instructions of a basic block of control flow graph into disjoint sub-blocks during the SSA form translation. Following this an algorithm is presented which constructs a graph tracking dependencies among the sub-blocks spread all over the program. A global scheduler of the compiler is presented which selectively maps sub-blocks in the dependency graph on to multiple cores, taking care of the dependencies among them. The proposed approach conforms to spatial locality, aims for minimized cache coherence problems, communication latency among the cores and overhead of hardware level instruction re-ordering while extracting parallelism and saving power. The results observed are indicative of better and balanced speedup per watt consumed.

international conference electronic systems, signal processing and computing technologies [icesc-] | 2014

An FPGA-Based Novel Architecture for the Fixed-Point Binary Antilogarithmic Computation

J. G. Pandey; A. Karmakar; C. Shekhar; S. Gurunarayanan

Emerging embedded system applications require low power, fast and area-efficient implementation of complex arithmetic operations. Modern field-programmable gate array (FPGA) is a suitable candidate for implementing any reasonably complex architecture within minimal design time. Apart from the logic resources, most of the FPGAs contain hard-macro elements. By using a fixed-point data path, the available FPGA macro elements can be used to design an architecture that is much more complex. The realization of the complex arithmetic elements can be simpler by using a logarithmic number system. In this paper, a novel architecture and the FPGA realization of an antilogarithmic computing circuit is proposed. The proposed antilogarithmic circuit uses piecewise linear approximation method. The same architecture works for both the positive and negative binary numbers. A unique barrel-shifter is designed which shifts the input data to the left or right by the required amount. The proposed architecture is implemented in the Xilinx Virtex-5 xc5vfx70t device. The device utilization shows that the architecture utilizes a minimal FPGA resource. We have also performed error analysis of the approximation result. The error analysis shows that error associated with the positive numbers is 0.16% while that for the negative numbers is 0.8%. The error can be further minimized by taking more bits for the fractional bit representation.

international conference on advanced computing | 2007

LLRU: Late LRU Replacement Strategy for Power Efficient Embedded Cache

Biju K. Raveendran; T. S. B. Sudarshan; P.D. Kumar; P. Tangudu; S. Gurunarayanan

This paper proposes a new cache replacement scheme, late least recently used (LLRU). LLRU takes care of shared pages improves its accessibility and offers improved cache performance. LLRU modifies the existing least recently used (LRU) algorithm. This scheme, improves cache performance for applications, which has shared pages. We also propose square matrix and counter based hardware design for LLRU. We show that the proposed scheme will achieve considerable improvement in hit rate. The experimental results are obtained using Simplescalar2.0 cache simulator benchmark. The hardware performance of LLRU counter and square matrix implementation is measured by using Modelsim and Leonardo spectrum.

international conference on signal processing | 2014

Implementation of an improved connected component labeling algorithm using FPGA-based platform

J. G. Pandey; Abhijit Karmakar; A. K. Mishra; Chandra Shekhar; S. Gurunarayanan

Labeling of connected components is one of the most fundamental operations in the area of image and video processing. This paper presents a field-programmable gate array (FPGA) platform based approach for implementing an efficient and improved two-scan equivalence-based connected component labeling algorithm. The implementation utilizes standard intellectual-property (IP) elements, FPGA off-the-shelf components, peripherals available on the Xilinx ML-507 FPGA platform and runs on an embedded PowerPC 440 processor available in the Xilinx Virtex-5 xc5vfx70t FPGA device. In this work, the equivalence handling mechanism of Stefano-Bulgarelli (SB) algorithm is improved to achieve complete merger for all the possible cases. The improved algorithm is tested using binary test patterns and standard images. The results demonstrate that the improved algorithm handles equivalences efficiently and gives accurate count of connected components. The proposed FPGA-based system arrangement can be efficiently utilized in many practical image and video processing applications, which uses connected component labeling algorithm.

Explore More