Sriram Govindarajan
University of Cincinnati
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Sriram Govindarajan.
international parallel processing symposium | 1998
Iyad Ouaiss; Sriram Govindarajan; Vinoo Srinivasan; Meenakshi Kaul; Ranga Vemuri
This paper presents an integrated design system called SPARCS (Synthesis and Partitioning for Adaptive Reconfigurable Computing Systems) for automatically partitioning and synthesizing designs for reconfigurable boards with multiple field-programmable devices (FPGAS). The SPARCS system accepts design specifications at the behavior level, in the form of task graphs. The system contains a temporal partitioning tool to temporally divide and schedule the tasks on the reconfigurable architecture, a spatial partitioning tool to map the tasks to individual FPGAs, and a high-level synthesis tool to synthesize efficient register-transfer level designs for each set of tasks destined to be downloaded on each FPGA. Commercial logic and layout synthesis tools are used to complete logic synthesis, placement, and routing for each FPGA design segment. A distinguishing feature of the SPARCS system is the tight integration of the partitioning and synthesis tools to accurately predict and control design performance and resource utilizations. This paper presents an overview of SPARCS and the various algorithms used in the system, along with a brief description of how a JPEG-like image compression algorithm is mapped to a Multi-FPGA board using SPARCS.
design automation conference | 1999
Meenakshi Kaul; Ranga Vemuri; Sriram Govindarajan; Iyad Ouaiss
We present an automated temporal partitioning and loop transformation approach for developing dynamically reconfigurable designs starting from behavior level specifications. An Integer Linear Programming (ILP) model is formulated to achieve near-optimal latency designs. We, also present a loop restructuring method to achieve maximum throughput for a class of DSP applications. This restructuring transformation is performed on the temporally partitioned behavior and results in near-optimization of throughput. We discuss efficient memory mapping and address generation techniques for the synthesis of reconfigurable designs. A case study on the Joint Photographic Experts Group (JPEG) image compression algorithm demonstrates the effectiveness of our approach.
field-programmable custom computing machines | 1998
Sriram Govindarajan; Iyad Ouaiss; Meenakshi Kaul; Vinoo Srinivasan; Ranga Vemuri
The SPARCS system is an integrated partitioning and synthesis environment for reconfigurable architectures. In this paper, we use the Joint Photographic Experts Group (JPEG) image compression algorithm as a design example to demonstrate the effectiveness of dynamic reconfiguration achieved using SPARCS. We present a typical design process using the SPARCS system consisting of temporal partitioning, spatial partitioning, and design synthesis. The results, obtained on a commercial RC architecture, show that the multiply-reconfigured version of the JPEG compression algorithm achieves reasonable improvement in execution times compared to the one-time configured version.
application specific systems architectures and processors | 1996
Naren Narasimhan; Vinoo Srinivasan; Madhavi Vootukuru; Jeffrey Walrath; Sriram Govindarajan; Ranga Vemuri
We describe the process of hardware-software codesign of a JPEG-like still image compression system. The hardware components are targeted to execute on a reconfigurable hardware coprocessor which communicates with a host computer that executes all the software tasks. Central to our codesign methodology is the usage of software profiling, high-level estimation and synthesis tools. We describe the process of trade-off analysis and hardware task selection in detail. We present detailed experimental results gathered throughout the codesign process.
international conference on vlsi design | 2000
Sriram Govindarajan; Vinoo Srinivasan; Preetham Lakshmikanthan; Ranga Vemuri
This paper presents a novel technique to perform dynamic high-level exploration of a behavioral specification that is being partitioned for a multi-device architecture. The technique, unlike in traditional HLS, performs a global search on the four-dimensional design space formed by multiple partition segments of the behavior. Hence, the proposed technique effectively satisfies the global latency constraint on the entire design, as well as the area constraints on the individual partition segments. Since the technique is based on a rigorous exploration model, it employs an efficient low-complexity heuristic instead of an exhaustive search. We have provided a number of results by integrating the exploration technique with two popular partitioning algorithms: (i) simulated annealing and (ii) Fiduccia-Mattheyses. The proposed technique is highly effective in guiding any partitioning algorithm to a constraint satisfying solution, and in a fairly short execution time. At tight constraint values, the proposed technique has the ability to generate solutions that do not exist in search space of traditional HLS exploration techniques.
international parallel and distributed processing symposium | 2000
Preetham Lakshmikanthan; Sriram Govindarajan; Vinoo Srinivasan; Ranga Vemuri
This paper presents a technique to perform partitioning and synthesis of behavioral specifications. Partitioning of the design is done under multiple constraints - interconnections and device areas of the reconfigurable architecture, and the latency of the design. The proposed Multi-FPGA partitioning technique (FMPAR) is based on the Fiduccia-Mattheyses (FM) partitioning algorithm. In order to contemplate multiple implementations of the behavioral design, the partitioner is tightly integrated with an area estimator and design space exploration engine.A partitioning and synthesis framework was developed, with the FMPAR behavioral partitioner at the front-end and various synthesis phases (High-Level, Logic and Layout) at the back end. Results are provided to demonstrate the advantage of tightly integrating exploration with partitioning. It is also shown that, in relatively short runtimes, FMPAR generates designs of similar quality compared to a Simulated Annealing partitioner. Designs have been successfully implemented on a commercial multi-FPGA board, proving the effectiveness of the partitioner and the entire design framework.
international conference on vlsi design | 2001
Sujatha Sundararaman; Sriram Govindarajan; Ranga Vemuri
This paper presents a novel approach to optimize the performance of a design synthesized from a given behavioral application. The high-level synthesis process is highly restricted by a pre-characterized library from which components are chosen to implement operations in the behavior. Moreover logic optimization on the register-transfer level datapath is typically limited to within the register boundaries that enclose the chosen components. It is imperative that the datapath components be carefully selected and synthesized in order to obtain a performance gain. The technique presented in this paper consists of two primary steps, application-specific macro generation and replacement, that are performed prior to high-level synthesis, The macro generation step extracts macro subgraphs from the given application graph and generates a macro component (an equivalent netlist)for each macro subgraph. Further; each macro component is optimized for performance using commercial logic synthesis tools. Using the enriched component library, a macro replacement step modifies the behavioral graph such that some subgraphs are replaced by equivalent macros. The replacement step attempts to replace subgraphs such that the design latency is minimized. The modified behavioral graph along with the enriched component library is then taken through high-level, logic and layout synthesis. Experiments on DSP benchmarks show that the macro based synthesis process achieves significant improvement in design performance as opposed to the traditional design process. We have developed an automated performance-optimization framework that is only limited by the optimization capability of backend tools.
field programmable logic and applications | 2000
Sriram Govindarajan; Ranga Vemuri
This paper describes the tight integration of design space exploration with spatial and temporal partitioning algorithms in the SPARCS design automation system for RCs. In particular, this paper describes a novel technique to perform efficient design space exploration of parallel-process behaviors using the knowledge of spatial partitioning. The exploration technique satisfies the design latency constraints imposed by temporal partitioning and the device area constraints of the RC. Results clearly demonstrate the effectiveness of the partitioning knowledgeable exploration technique in guiding spatial partitioning to quickly converge to a constraint satisfying solution. Results of design automation through SPARCS and testing designs on a commercial RC board are also presented.
Hardware implementation of intelligent systems | 2001
Ranga Vemuri; Sriram Govindarajan; Iyad Ouaiss; Meenakshi Kaul; Vinoo Srinivasan; Shankar Radhakrishnan; Sujatha Sundaraman; Satish Ganesan; Awartika Pandey; Preetham Lakshmikanthan
The advent of reconfigurable logic arrays facilitates the development of adaptive architectures that have wide applicability as stand- alone intelligent systems. The hardware structure of such architectures can be rapidly altered to suit the changing computational needs of an application during its execution. The power of adaptive architectures has been demonstrated primarily in image processing, digital signal processing, and other areas such as neural networks and genetic algorithms. This chapter discusses the state-of-the-art architectures, their classification, and their applications. In order to effectively exploit adaptive architectures, efficient and retargetable design synthesis techniques are necessary. Further, the synthesis techniques must be fully integrated with design partitioning methods to make use of the multiplicity of reconfigurable devices provided by adaptive architectures. This chapter provides a description of a collection of synthesis and partitioning techniques and their embodiment in the SPARCS (Synthesis and Partitioning for Adaptive Reconfigurable Computing Systems) system.
design, automation, and test in europe | 2000
Sriram Govindarajan; Ranga Vemuri
The most compelling reason for High-Level Synthesis ( HLS) to be accepted in the state-of-the-art CAD flow is its ability to perform design space exploration. Design space exploration requires efficient scheduling techniques that have a low complexity and yet produce good quality schedules. The Time-Constrained Scheduling ( TCS) problem minimizes the number of functional units required to schedule a particular Data Flow Graph ( DFG) within a specified number of time steps. Over the past few years a number of techniques [1, 2] have been proposed to solve the TCS problem. Heuristic list schedulingalgorithms have been widely used for their low-complexity and good performance. The complexity of a dynamic-list scheduling algorithm, such as the Force Directed Scheduling(FDS), is , where is the time constraint and is the number of operations. Static-list scheduling [1, 2] algorithms are the least complex among the known class of scheduling techniques with a linear time complexity of . Typically, static-list scheduling algorithms, in order to maintain low-complexity, do not perform any look-ahead like that of FDS. The drawback is that, static-list scheduling algorithms may not generate high-quality schedules.