Aravind Dasu | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Aravind Dasu is active.

Explore More

Publication

Featured researches published by Aravind Dasu.

IEEE Transactions on Circuits and Systems for Video Technology | 2002

A survey of media processing approaches

Aravind Dasu; Sethuraman Panchanathan

Multimedia processing is becoming increasingly important with a wide variety of applications ranging from multimedia cellphones to high-definition interactive television. Media processing involves the capture, storage, manipulation and transmission of multimedia objects such as text, handwritten data, audio objects, still images, 2D/3D graphics, animation, and full-motion video. A number of implementation strategies have been proposed for processing multimedia data. These approaches can be broadly classified based on the evolution of processing architectures and the functionality of the processors. In order to provide media processing solutions to different consumer markets, designers have combined some of the classical features from both the functional and evolution-based classifications resulting in many hybrid solutions. We propose a categorization of existing microprocessors based on a combination of both architectural and functional flavors with examples of each approach from the latest multimedia processing families. The varying processing requirements in multimedia computing for reconfigurable multimedia processing are presented.

Iet Computers and Digital Techniques | 2010

Dynamically reconfigurable systolic array accelerators: A case study with extended Kalman filter and discrete wavelet transform algorithms

Arvind Sudarsanam; Robert Collier Barnes; J. Carver; Ramachandra Kallam; Aravind Dasu

Field programmable gate arrays (FPGAs) are increasingly being adopted as the primary on-board computing system for autonomous deep space vehicles. There is a need to support several complex applications for navigation and image processing in a rapidly responsive on-board FPGA-based computer. Developing such a computer requires the designer to explore and combine several design concepts such as systolic array (SA) design, hardware-software partitioning and partial dynamic reconfiguration (PDR). In this study a microprocessor/co-processor design that can simultaneously accelerate multiple single precision floating-point algorithms is proposed. Two such algorithms are extended Kalman filter (EKF) and discrete wavelet transform (DWT). Key contributions include (i) polymorphic systolic array (PolySA), comprising partial reconfigurable regions that can accelerate algorithms amenable to being mapped onto linear SAs and (ii) performance model to predict the overall execution time of EKF algorithm on the proposed PolySA architecture. When implemented on a low-end Xilinx Virtex4 SX35 FPGA, the design provides a speed-up of at least 4.18 x and 6.61 x over a state-of-the-art microprocessor used in spacecraft systems for the EKF and DWT algorithms, respectively. The performance of EKF algorithm on the proposed PolySA architecture was compared against the performance on two types of conventional (non-polymorphic) hardware architectures and the results showed that the proposed architecture outperformed the other two architectures in most of the test cases.

IEEE Computer Architecture Letters | 2009

PRR-PRR Dynamic Relocation

Arvind Sudarsanam; Ramachandra Kallam; Aravind Dasu

Partial bitstream relocation (PBR) on FPGAs has been gaining attention in recent years as a potentially promising technique to scale parallelism of accelerator architectures at run time, enhance fault tolerance, etc. PBR techniques to date have focused on reading inactive bitstreams stored in memory, on-chip or off-chip, whose contents are generated for a specific partial reconfiguration region (PRR) and modified on demand for configuration into a PRR at a different location. As an alternative, we propose a PRR-PRR relocation technique to generate source and destination addresses, read the bitstream from an active PRR (source) in a non-intrusive manner, and write it to destination PRR. We describe two options of realizing this on Xilinx Virtex 4 FPGAs: (a) hardware-based accelerated relocation circuit (ARC) and (b) a software solution executed on Microblaze. A comparative performance analysis to highlight the speed-up obtained using ARC is presented. For real test cases, performance of our implementations are compared to estimated performances of two state of the art methods.

parallel computing | 2002

Reconfigurable media processing

Aravind Dasu; Sethuraman Panchanathan

Multimedia processing is becoming increasingly important with wide variety of applications ranging from multimedia cell phones to high definition interactive television. Media processing techniques typically involve the capture, storage, manipulation and transmission of multimedia objects such as text, handwritten data, audio objects, still images, 2D/3D graphics, animation and full-motion video. A number of implementation strategies have been proposed for processing multimedia data. These approaches can be broadly classified into two major categories, namely (i) general purpose processors with programmable media processing capabilities, and (ii) dedicated implementations (ASICs). We have performed a detailed complexity analysis of the recent multimedia standard (MPEG-4) which has shown the potential for reconfigurable computing, that adapts the underlying hardware dynamically in response to changes in the input data or processing environment. We therefore propose a methodology for designing a reconfigurable media processor. This involves hardware-software co-design implemented in the form of a parser, profiler, recurring pattern analyzer, spatial and temporal partitioner. The proposed methodology enables efficient partitioning of resources for complex and time critical multimedia applications.

international parallel and distributed processing symposium | 2007

A Reconfigurable Load Balancing Architecture for Molecular Dynamics

Jonathan Phillips; Matthew Areno; Christopher Reed Rogers; Aravind Dasu; Brandon Eames

This paper proposes a novel architecture supporting dynamic load balancing on an FPGA for a molecular dynamics algorithm. Load balancing is primarily achieved through the use of specialized processing units, referred to as FLEX units. FLEX units are able to switch between tasks required by a molecular dynamics algorithm as often as needed in order to cater to the nature of the input parameters. This architecture is capable of run-time performance analysis and dynamic resource allocation in order to maximize throughput. Results of a prototype of the architecture targeting an FPGA are presented.

IEEE Transactions on Circuits and Systems for Video Technology | 2004

A wavelet-based sprite codec

Aravind Dasu; Sethuraman Panchanathan

The International Standards Organization (ISO) has proposed a family of standards for compression of image and video sequences, including the JPEG, MPEG-1, and MPEG-2. The latest MPEG-4 standard has many new dimensions to coding and manipulation of visual content. A video sequence usually contains a background object and many foreground objects. Portions of this background may not be visible in certain frames due to the occlusion of the foreground objects or camera motion. MPEG-4 introduces the novel concepts of video object planes (VOPs) and Sprites. A VOP is a visual representation of real world objects with shapes that need not be rectangular. Sprite is a large image composed of pixels belonging to a video object visible throughout a video segment. Since a sprite contains all parts of the background that were at least visible once, it can be used for direct reconstruction of the background VOP. Sprite reconstruction is dependent on the mode in which it is transmitted. In the static sprite mode, the entire sprite is decoded as an Intra VOP before decoding the individual VOPs. Since sprites consist of the information needed to display multiple frames of a video sequence, they are typically much larger than a single frame of video. Therefore, a static sprite can be considered as a large static image. In this paper, a novel solution to address the problem of spatial scalability has been proposed, where the sprite is encoded in discrete wavelet transform (DWT). A lifting kernel method of DWT implementation has been used for encoding and decoding sprites. Modifying the existing lifting scheme while maintaining it to be shape-adaptive results in a reduced complexity. The proposed scheme has the advantages of: 1) avoiding the need for any extensions to image or tile border pixels and is hence superior to the discrete cosine transform-based low latency scheme (used in the current MPEG-4 verification model) and 2) mapping the in place computed wavelet coefficients into a zero-tree structure without actually rearranging them, thereby saving allocation of additional memory. The proposed solutions provide efficient implementation of the sprite decoder, making possible a VLSI realization with a reduced real estate.

Iet Computers and Digital Techniques | 2009

Methodology to derive context adaptable architectures for FPGAs

Jonathan Phillips; Arvind Sudarsanam; Harikrishna Samala; Ramachandra Kallam; J. Carver; Aravind Dasu

The configurable nature of field-programmable gate arrays (FPGAs) has allowed designers to take advantage of various data flow characteristics in application kernels to create custom architecture implementations, by optimising instruction level paralleism (ILP) and pipelining at the register transfer level. However, not all applications are composed of pure data flow kernels. Intermingling of control and data flows in applications offers more interesting challenges in creating custom architectures. The authors present one possible way to take advantage of correlations that may be present among data flow graphs (DFGs) embedded in control flow graphs. In certain cases, where there is sufficient correlation and ILP, the proposed context adaptable architecture (CAA) design methodology results in an interesting and useful custom architecture for such embedded DFGs. Certain other application characteristics may demand the use of alternative methodologies such as partial and dynamic reconfiguration (PDR) and a mixture of PDR and common sub-graph methods (PDR-CSG). The authors present a rigorous analysis, combined with some benchmarking efforts to showcase the differences, advantages and disadvantages of the CAA methodology with other methodologies. The authors also present an analysis of how the core underlying algorithm in our methodology compares with other published algorithms and the differences in resulting designs on an FPGA for a sample set of test cases.

Reconfigurable technology : FPGAs and reconfigurable processors for computing and communications. Conference | 2001

Temporal partitioning of circuits for advanced partially reconfigurable systems

Rajanikant Mohan; Aravind Dasu; Sethuraman Panchanathan

Reconfigurable architectures are proving to be very effective in applications that involve the implementation of multiple compute-intensive algorithms, which share the same computing modules. With the advent of dynamically reconfigurable architectures, many temporal partitioning algorithms (TPA) have been proposed address the issue of area and time constraints. The main objective of TPA is to divide a large design into smaller sub-components so that they can be implemented over multiple reconfigurations. In this paper, we propose a new temporal partitioning process (TPP), which includes a modified TPA along with a port reallocation algorithm (PRA) to reduce the reconfiguration time to facilitate real-time implementation. The reduction in reconfiguration time is achieved by employing the knowledge of the function implemented in each logic block thereby effectively reusing the cells in the array in a selective manner. This avoids the need for complete reconfiguration and reduces the net reconfiguration time. The proposed approach has been tested on random graphs and on the MCNC benchmark circuits. Significant reduction in reconfiguration time has been achieved.

electronic imaging | 1999

Arithmetic precision for perspective transform in sprite decoding of MPEG-4

Aravind Dasu; Subramanian Raghavan; Nagaraj C. Raghavendra; Sethuraman Panchanathan

This paper presents a method of implementing the perspective transform as specified in the MPEG-4 standard using 32-bit fixed-point reduced precision calculations instead of using 64-bit floating-point full precision operators. We achieve this by removing some redundant calculations and truncating the numerator and denominator terms of the transform without loss of accuracy.

ieee international symposium on parallel distributed processing workshops and phd forum | 2010

Self-configurable architecture for reusable systems with Accelerated Relocation Circuit (SCARS-ARC)

Adarsha Sreeramareddy; Ramachandra Kallam; Aravind Dasu; Ali Akoglu

Field Programmable Gate Arrays (FPGAs), with partial reconfiguration (PR) technology present an attractive option for creating reliable platforms that adapt to changes in user objectives over time and respond to hardware/software anomalies automatically with self-healing action. Conventional solutions for partial reconfiguration based self-configurable architectures experience severe hardware limitations on ability to move any partially reconfigurable module to any available region of the reconfigurable fabric and ability to relocate the module quickly. In this study we adopt the hardware-based partial bitstream relocation technique, Accelerated Relocation Circuit (ARC), into the FPGA based wirelessly networked self-configurable architecture that employs traditional module based partial reconfiguration strategy. We show that the integrated architecture allows flexibility for module relocation, reduces the off-chip communication overhead, and observes up to 17x speedup for module relocation over the traditional Xilinx hardware internal configuration access port wrapper (HWICAP) based implementation.

Explore More