Vinoo Srinivasan
University of Cincinnati
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Vinoo Srinivasan.
international parallel processing symposium | 1998
Iyad Ouaiss; Sriram Govindarajan; Vinoo Srinivasan; Meenakshi Kaul; Ranga Vemuri
This paper presents an integrated design system called SPARCS (Synthesis and Partitioning for Adaptive Reconfigurable Computing Systems) for automatically partitioning and synthesizing designs for reconfigurable boards with multiple field-programmable devices (FPGAS). The SPARCS system accepts design specifications at the behavior level, in the form of task graphs. The system contains a temporal partitioning tool to temporally divide and schedule the tasks on the reconfigurable architecture, a spatial partitioning tool to map the tasks to individual FPGAs, and a high-level synthesis tool to synthesize efficient register-transfer level designs for each set of tasks destined to be downloaded on each FPGA. Commercial logic and layout synthesis tools are used to complete logic synthesis, placement, and routing for each FPGA design segment. A distinguishing feature of the SPARCS system is the tight integration of the partitioning and synthesis tools to accurately predict and control design performance and resource utilizations. This paper presents an overview of SPARCS and the various algorithms used in the system, along with a brief description of how a JPEG-like image compression algorithm is mapped to a Multi-FPGA board using SPARCS.
design, automation, and test in europe | 1998
Vinoo Srinivasan; Shankar Radhakrishnan; Ranga Vemuri
This paper presents an integrated approach to hardware software partitioning and hardware design space exploration. We propose a genetic algorithm which performs hardware software partitioning on a task graph while simultaneously contemplating various design alternatives for tasks mapped to hardware. We primarily deal with data dominated designs typically found in digital signal processing and image processing applications. A detailed description of various genetic operators is presented. We provide results to illustrate the effectiveness of our integrated methodology.
field-programmable custom computing machines | 1998
Sriram Govindarajan; Iyad Ouaiss; Meenakshi Kaul; Vinoo Srinivasan; Ranga Vemuri
The SPARCS system is an integrated partitioning and synthesis environment for reconfigurable architectures. In this paper, we use the Joint Photographic Experts Group (JPEG) image compression algorithm as a design example to demonstrate the effectiveness of dynamic reconfiguration achieved using SPARCS. We present a typical design process using the SPARCS system consisting of temporal partitioning, spatial partitioning, and design synthesis. The results, obtained on a commercial RC architecture, show that the multiply-reconfigured version of the JPEG compression algorithm achieves reasonable improvement in execution times compared to the one-time configured version.
application specific systems architectures and processors | 1996
Naren Narasimhan; Vinoo Srinivasan; Madhavi Vootukuru; Jeffrey Walrath; Sriram Govindarajan; Ranga Vemuri
We describe the process of hardware-software codesign of a JPEG-like still image compression system. The hardware components are targeted to execute on a reconfigurable hardware coprocessor which communicates with a host computer that executes all the software tasks. Central to our codesign methodology is the usage of software profiling, high-level estimation and synthesis tools. We describe the process of trade-off analysis and hardware task selection in detail. We present detailed experimental results gathered throughout the codesign process.
international conference on vlsi design | 2000
Sriram Govindarajan; Vinoo Srinivasan; Preetham Lakshmikanthan; Ranga Vemuri
This paper presents a novel technique to perform dynamic high-level exploration of a behavioral specification that is being partitioned for a multi-device architecture. The technique, unlike in traditional HLS, performs a global search on the four-dimensional design space formed by multiple partition segments of the behavior. Hence, the proposed technique effectively satisfies the global latency constraint on the entire design, as well as the area constraints on the individual partition segments. Since the technique is based on a rigorous exploration model, it employs an efficient low-complexity heuristic instead of an exhaustive search. We have provided a number of results by integrating the exploration technique with two popular partitioning algorithms: (i) simulated annealing and (ii) Fiduccia-Mattheyses. The proposed technique is highly effective in guiding any partitioning algorithm to a constraint satisfying solution, and in a fairly short execution time. At tight constraint values, the proposed technique has the ability to generate solutions that do not exist in search space of traditional HLS exploration techniques.
international parallel and distributed processing symposium | 2000
Preetham Lakshmikanthan; Sriram Govindarajan; Vinoo Srinivasan; Ranga Vemuri
This paper presents a technique to perform partitioning and synthesis of behavioral specifications. Partitioning of the design is done under multiple constraints - interconnections and device areas of the reconfigurable architecture, and the latency of the design. The proposed Multi-FPGA partitioning technique (FMPAR) is based on the Fiduccia-Mattheyses (FM) partitioning algorithm. In order to contemplate multiple implementations of the behavioral design, the partitioner is tightly integrated with an area estimator and design space exploration engine.A partitioning and synthesis framework was developed, with the FMPAR behavioral partitioner at the front-end and various synthesis phases (High-Level, Logic and Layout) at the back end. Results are provided to demonstrate the advantage of tightly integrating exploration with partitioning. It is also shown that, in relatively short runtimes, FMPAR generates designs of similar quality compared to a Simulated Annealing partitioner. Designs have been successfully implemented on a commercial multi-FPGA board, proving the effectiveness of the partitioner and the entire design framework.
international parallel processing symposium | 1999
Vinoo Srinivasan; Shankar Radhakrishnan; Ranga Vemuri; Jeffrey Walrath
Most reconfigurable multi-FPGA architectures have a programmable interconnection network that can be reconfigured to implement different interconnection patterns between the FPGAs and memory devices on the board. Partitioning tools for such architectures must produce the necessary pin-assignments and interconnect configuration stream that correctly implement the partitioned design. We call this process Interconnect Synthesis for reconfigurable architectures.
field programmable custom computing machines | 1999
Vinoo Srinivasan; Ranga Vemuri
This paper presents SPADE, a system for partitioning designs onto multi-FPGA architectures. The input to SPADE is a task graph, that is composed of computational tasks, memory tasks and the communication and synchronization between tasks. SPADE consist of an iterative partitioning engine, an architectural constraint evaluator, and a throughput optimization and RTL design space exploration heuristic. We show how various architectural constraints can be effectively handled using an iterative partitioning engine.
international conference on vlsi design | 1998
Vinoo Srinivasan; Ranga Vemuri
This paper presents a fast and efficient heuristic for pipelining a loop under resource-constraints. The loop is represented as a dependence graph, G whose nodes are operations that are bound to available resources and edges denote the data dependencies between the operations. The data dependencies restrict the degree of parallelism that can be achieved while scheduling the graph. We propose a fast retiming based graph transformation technique which relates the data dependencies in the graph while maintaining functional equivalence. Relaxing data dependencies provides more flexibility for the scheduler to schedule operations, thereby leading to faster throughput. Our objective is to obtain a retimed graph which when scheduled achieves an optimal/near-optimal pipelined steady state throughput. A detailed algorithm is presented to solve the problem. We provide results that illustrate the effectiveness of our algorithm.
application specific systems architectures and processors | 1996
Jeffrey Walrath; S. Chatha; Ranga Vemuri; Naren Narasimhan; Vinoo Srinivasan
Tradeoff analysis is a central aspect of any design process. Languages and tools to support performance modeling and tradeoff analysis are necessary to facilitate rapid prototyping of designs. An effective modeling and evaluation environment reduces the overall design time of both the prototype and the final product by helping designers in determining which parameters of a design are critical for meeting a set of desired performance goals. This paper describes a case study in performance modeling using a language called PDL (Performance Modeling Language). The PDL system supports tradeoff analysis and performance visualization. This paper also addresses some of the key issues for successful tradeoff analysis during rapid prototyping and explains how many features of PDL make it a suitable choice for this purpose.