Ranga Vemuri | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ranga Vemuri is active.

Explore More

Publication

Featured researches published by Ranga Vemuri.

IEEE Transactions on Very Large Scale Integration Systems | 2002

Hardware-software partitioning and pipelined scheduling of transformative applications

Karam S. Chatha; Ranga Vemuri

Transformative applications are computation intensive applications characterized by iterative dataflow behavior. Typical examples are image processing applications like JPEG, MPEG, etc. The performance of embedded hardware-software systems that implement transformative applications can be maximized by obtaining a pipelined design. We present a tool for hardware-software partitioning and pipelined scheduling of transformative applications. The tool uses iterative partitioning and pipelined scheduling to obtain optimal partitions that satisfy the timing and area constraints. The partitioner uses a branch and bound approach with a unique objective function that minimizes the initiation interval of the final design. We present techniques for generation of good initial solution and search-space limitation for the branch and bound algorithm. A candidate partition is evaluated by generating its pipelined schedule. The scheduler uses a novel retiming heuristic that optimizes the initiation interval, number of pipeline stages, and memory requirements of the particular design alternative. We evaluate the performance of the retiming heuristic by comparing it with an existing technique. The effectiveness of the entire tool is demonstrated by a case study of the JPEG image compression algorithm. We also evaluate the run time and design quality of the tool by experimentation with synthetic graphs.

international parallel processing symposium | 1998

An integrated Partitioning and synthesis system for dynamically Reconfigurable multi-FPGA architectures

Iyad Ouaiss; Sriram Govindarajan; Vinoo Srinivasan; Meenakshi Kaul; Ranga Vemuri

This paper presents an integrated design system called SPARCS (Synthesis and Partitioning for Adaptive Reconfigurable Computing Systems) for automatically partitioning and synthesizing designs for reconfigurable boards with multiple field-programmable devices (FPGAS). The SPARCS system accepts design specifications at the behavior level, in the form of task graphs. The system contains a temporal partitioning tool to temporally divide and schedule the tasks on the reconfigurable architecture, a spatial partitioning tool to map the tasks to individual FPGAs, and a high-level synthesis tool to synthesize efficient register-transfer level designs for each set of tasks destined to be downloaded on each FPGA. Commercial logic and layout synthesis tools are used to complete logic synthesis, placement, and routing for each FPGA design segment. A distinguishing feature of the SPARCS system is the tight integration of the partitioning and synthesis tools to accurately predict and control design performance and resource utilizations. This paper presents an overview of SPARCS and the various algorithms used in the system, along with a brief description of how a JPEG-like image compression algorithm is mapped to a Multi-FPGA board using SPARCS.

design automation conference | 2004

An efficient algorithm for finding empty space for online FPGA placement

Manish Handa; Ranga Vemuri

A fast and efficient algorithm for finding empty area is necessary for online placement, task relocation and defragmentation on a partially reconfigurable FPGA. We present an algorithm that finds empty area as a list of overlapping maximal rectangles. Using an innovative representation of the FPGA, we are able to predict possible locations of the maximal empty rectangles. Worst-case time complexity of our algorithm is O(xy) where x is the number of columns, y is the number of rows and x.y is the total number of cells on the FPGA. Experiments show that, in practice, our algorithm needs to scan less than 15% of the FPGA cells to make a list of all maximal empty rectangles.

design automation conference | 1999

An automated temporal partitioning and loop fission approach for FPGA based reconfigurable synthesis of DSP applications

Meenakshi Kaul; Ranga Vemuri; Sriram Govindarajan; Iyad Ouaiss

We present an automated temporal partitioning and loop transformation approach for developing dynamically reconfigurable designs starting from behavior level specifications. An Integer Linear Programming (ILP) model is formulated to achieve near-optimal latency designs. We, also present a loop restructuring method to achieve maximum throughput for a class of DSP applications. This restructuring transformation is performed on the temporally partitioned behavior and results in near-optimization of throughput. We discuss efficient memory mapping and address generation techniques for the synthesis of reconfigurable designs. A case study on the Joint Photographic Experts Group (JPEG) image compression algorithm demonstrates the effectiveness of our approach.

design, automation, and test in europe | 1998

Hardware/software partitioning with integrated hardware design space exploration

Vinoo Srinivasan; Shankar Radhakrishnan; Ranga Vemuri

This paper presents an integrated approach to hardware software partitioning and hardware design space exploration. We propose a genetic algorithm which performs hardware software partitioning on a task graph while simultaneously contemplating various design alternatives for tasks mapped to hardware. We primarily deal with data dominated designs typically found in digital signal processing and image processing applications. A detailed description of various genetic operators is presented. We provide results to illustrate the effectiveness of our integrated methodology.

IEEE Transactions on Very Large Scale Integration Systems | 1995

Generation of design verification tests from behavioral VHDL programs using path enumeration and constraint programming

Ranga Vemuri; Ravi Kalyanaraman

A method for generation of design verification tests from behavior-level VHDL programs is presented. The method generates stimuli to execute desired control-flow paths in the given VHDL program. This method is based on path enumeration, constraint generation and constraint solving techniques that have been traditionally used for software testing. Behavioral VHDL programs contain multiple communicating processes, signal assignment statements, and wait statements which are not found in traditional software programming languages. Our model of constraint generation is specifically developed for VHDL programs with such constructs. Control-flow paths for which design verification tests are desired are specified through certain annotations attached to the control statements in the VHDL programs. These annotations are used to enumerate the desired paths. Each enumerated path is translated into a set of mathematical constraints corresponding to the statements in the path. Methods for generating constraint variables corresponding to various types of carriers in VHDL and for mapping various VHDL statements into mathematical relationships among these constraint variables are developed. These methods treat spatial and temporal incarnations of VHDL carriers as unique constraint variables thereby preserving the semantics of the behavioral VHDL programs. Constraints are generated in the constraint programming language CLP(R) and are solved using the CLP(R) system. A solution to the set of constraints so generated yields a design verification test sequence which can be applied for executing the corresponding control path when the design is simulated. If no solution exists, then it implies that the corresponding path can never be executed. Experimental studies pertaining to the quality of path coverage and fault coverage of the verification tests are presented. >

IEEE Design & Test of Computers | 1995

Profile-driven behavioral synthesis for low-power VLSI systems

Nand Kumar; Srinivas Katkoori; Leo Rader; Ranga Vemuri

We present a profile-driven approach to behavior level synthesis. In this approach, event activities related to various operations and carriers in the behavioral specification are measured by simulating the description using user-supplied profiling stimuli. These event activities are then used during the synthesis process to estimate the switching activity in the design being synthesized. Overall switching activity estimation is based on modulating the average intrinsic switching activities of the synthesis library modules using the event activities. This estimate is used to select a module set and a schedule which, besides meeting the area and clock-speed constraints, would minimize the switching activity in the design. Experimental results for a number of examples show that the switching activity estimated during synthesis deviates by less than 10% on the average from the actual switching activity measured after completing synthesis.The same profile-driven approach is applied to estimate the total amount of capacitance that would switch in the design when the given stimuli is applied. Again, experimental results show that, on the average, the estimated switched capacitance deviates from the actual measured value by about 12%.

great lakes symposium on vlsi | 2005

LiPaR: A light-weight parallel router for FPGA-based networks-on-chip

Balasubramanian Sethuraman; Prasun Bhattacharya; Jawad Khan; Ranga Vemuri

Present day technology for ASICs supports Networks-on-Chip designs which can have 100 million gates on a single chip. The latest FPGAs can support only about 10 million gates to accomodate all logic and the associated routing. In order to implement a competitive NoC architecture in FP-GAs, the area occupied by the network should be kept to a minimum. This ensures that the maximum area can be utilized by the logic while maintaining the performance of the router network. Reducing area also reduces the power consumption. In this paper, we implement a parallel router which can support five simultaneous routing requests at the same time with an area overhead of only 352 Xilinx Virtex-II Pro FPGA slices (2. 57% of XC2VP30). We introduce optimizations in XY routing and decoding logic thereby gaining in area and performance. The header overhead is 8 bits per packet and the packet size can vary between 16 and 128 bits. We also implement a 3 x 3 mesh network with a total area overhead of 28% leaving 72% of the area available for the logic in a Virtex-II Pro XC2VP30 device. We characterize the router and several mesh networks for power and performance parameters.

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems | 2003

Extraction and use of neural network models in automated synthesis of operational amplifiers

Glenn Wolfe; Ranga Vemuri

Fast and accurate performance estimation methods are essential to automated synthesis of analog circuits. Development of analog performance models is difficult due to the highly nonlinear nature of various analog performance parameters. This paper presents a neural network-based methodology for creating fast and efficient models for estimating the performance parameters of CMOS operational amplifier topologies. Effective methods for generation and use of the training data are proposed to enhance the accuracy of the neural models. The efficiency and accuracy of the resulting performance models are demonstrated via their use in a genetic algorithm-based circuit synthesis system. The genetic synthesis tool optimizes a fitness function based on user-specified performance constraints. The performance parameters of the synthesized circuits are validated by SPICE simulations and compared with those predicted by the neural network models. Experimental studies demonstrate that neural network modeling is an effective, fast, and accurate methodology for performance estimation.

design, automation, and test in europe | 1998

Optimal temporal partitioning and synthesis for reconfigurable architectures

Meenakshi Kaul; Ranga Vemuri

We develop a 0-1 non-linear programming (NLP) model for combined temporal partitioning and high-level synthesis from behavioral specifications destined to be implemented on reconfigurable processors. We present tight linearizations of the NLP model. We present effective variable selection heuristics for a branch and bound solution of the derived linear programming model. We show how tight linearizations combined with good variable selection techniques during branch and bound yield optimal results in relatively short execution times.

Explore More