Ganesh Lakshminarayana

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ganesh Lakshminarayana is active.

Explore More

Publication

Featured researches published by Ganesh Lakshminarayana.

design automation conference | 1997

COSYN: hardware-software co-synthesis of embedded systems

Bharat P. Dave; Ganesh Lakshminarayana; Niraj K. Jha

Hardware-software co-synthesis is the process ofpartitioning an embedded system specification into hardware andsoftware modules to meet performance, power and cost goals. Inthis paper, we present a co-synthesis algorithm which starts withperiodic task graphs with real-time constraints and produces a low-costheterogeneous distributed embedded system architecturemeeting the constraints. The algorithm has the following features:1) it allows the use of multiple types of processing elements (PEs)and inter-PE communication links, where the links can take variousforms (point-to-point, bus, local area network (LAN), etc.), 2) itsupports both concurrent and sequential modes of communicationand computation, 3) it allows both preemptive and non-preemptivescheduling, 4) it employs the concept of an association array totackle the problem of multi-rate systems (which are commonlyfound in multimedia applications), 5) it uses a scheduler based ondynamic deadline-based priority levels for accurate performanceestimation of a co-synthesis solution, 6) it uses a new taskclustering technique which takes the dynamic nature of the criticalpath, and the existence of multiple critical paths in the task graphinto account, and 7) if desired, it also optimizes the architecture forpower consumption (we are not aware of any other co-synthesisalgorithm that optimizes power). Application of the proposedalgorithm to examples from the literature and real-life telecomtransport systems shows its efficacy.

IEEE Transactions on Very Large Scale Integration Systems | 1999

COSYN: Hardware-software co-synthesis of heterogeneous distributed embedded systems

Bharat P. Dave; Ganesh Lakshminarayana; Niraj K. Jha

Hardware-software co-synthesis starts with an embedded-system specification and results in an architecture consisting of hardware and software modules to meet performance, power, and cost goals. Embedded systems are generally specified in terms of a set of acyclic task graphs. In this paper, we present a co-synthesis algorithm COSYN, which starts with periodic task graphs with real-time constraints and produces a low-cost heterogeneous distributed embedded-system architecture meeting these constraints. It supports both concurrent and sequential modes of communication and computation. It employs a combination of preemptive and nonpreemptive static scheduling. It allows task graphs in which different tasks have different deadlines. It introduces the concept of an association array to tackle the problem of multirate systems. It uses a new task-clustering technique, which takes the changing nature of the critical path in the task graph into account. It supports pipelining of task graphs and a mix of various technologies to meet embedded-system constraints and minimize power dissipation. In general, embedded-system tasks are reused across multiple functions. COSYN uses the concept of architectural hints and reuse to exploit this fact. Finally, if desired, it also optimizes the architecture for power consumption. COSYN produces optimal results for the examples from the literature while providing several orders of magnitude advantage in central processing unit time over an existing optimal algorithm. The efficacy of COSYN and its low-power extension COSYN-LP is also established through their application to very large task graphs (with over 1000 tasks).

design automation conference | 2001

LOTTERYBUS: a new high-performance communication architecture for system-on-chip designs

Kanishka Lahiri; Anand Raghunathan; Ganesh Lakshminarayana

This paper presents Lotterybus, a novel high-performance communication architecture for system-on-chip (SoC) designs. The Lotterybus architecture was designed to address the following limitations of current communication architectures: (i) lack of control over the allocation of communication bandwidth to different system components or data flows (e.g., in static priority based shared buses), leading to starvation of lower priority components in some situations, and (ii) significant latencies resulting from variations in the time-profile of the communication requests (e.g., in time division multiplexed access (TDMA) based architectures), sometimes leading to larger latencies for high-priority communications. We present two variations of Lotterybus: the first is a low overhead architecture with statically configured parameters, while the second variant is a more sophisticated architecture, in which values of the architectural parameters are allowed to vary dynamically. Our experiments investigate the performance of the Lotterybus architecture across a wide range of communication traffic characteristics. In addition, we also analyze its performance in a 4x4 ATM switch sub-system design. The results demonstrate that the Lotterybus architecture is (i) capable of providing the designer with fine grained control over the bandwidth allocated to each SoC component or data flow, and (ii) well suited to provide high priority communication traffic with low latencies (we observed upto 85.4\% reduction in communication latencies over conventional on-chip communication architectures).

IEEE Transactions on Very Large Scale Integration Systems | 2006

The LOTTERYBUS on-chip communication architecture

Kanishka Lahiri; Anand Raghunathan; Ganesh Lakshminarayana

On-chip communication architectures play an important role in determining the overall performance of System-on-Chip (SoC) designs. Communication architectures should be flexible so as to offer high performance over a wide range of traffic characteristics. In particular, the resource sharing mechanism of the communication architecture, which determines how the often-conflicting requirements of different components are served, is of utmost importance. Conventional SoC architectures typically employ priority or time-division multiple-access (TDMA)-based communication architectures. However, these techniques are often inadequate. In the former, low-priority components may suffer from starvation, while in the latter, depending on the request profile, high-priority traffic may be subject to large latencies. This paper presents LOTTERYBUS, a high-performance SoC communication architecture based on new randomized on-chip communication protocols that addresses the shortcomings mentioned above. LOTTERYBUS provides each SoC component with a flexible, proportional, and probabilistically guaranteed share of the on-chip communication bandwidth. We present two variants of LOTTERYBUS. In the first variant, its architectural parameters are statically configured, leading to relatively low hardware overhead and design complexity. In the second variant, these parameters are allowed to vary dynamically, enabling more sophisticated use of LOTTERYBUS, at additional hardware cost. We have performed experiments to investigate the performance of LOTTERYBUS across a range of communication traffic characteristics. We have used LOTTERYBUS in designing a 4times4 ATM switch subsystem, and have compared its performance with conventional architectures. The results show that LOTTERYBUS provides fine-grained control over bandwidth allocation, and also provides significant reduction in average transaction latencies (up to 85%) compared to conventional architectures. Hardware implementations using a commercial 0.15-mum cell-based library indicate that the advantages provided by LOTTERYBUS are accompanied by modest hardware overheads compared to conventional architectures

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems | 1999

High-level synthesis of low-power control-flow intensive circuits

Kamal S. Khouri; Ganesh Lakshminarayana; Niraj K. Jha

In this paper, we present a comprehensive high-level synthesis system that is geared toward reducing power consumption in control-flow intensive as well as data-dominated circuits. An iterative improvement framework allows the system to search the design space by examining the interaction between the different high-level synthesis tasks. In addition to incorporating traditional high-level synthesis tasks such as scheduling, module selection and resource sharing, we introduce a new optimization that performs power-conscious structuring of multiplexer networks, which are predominant in control-flow intensive circuits. The scheduler employed is capable of loop optimizations within and across loop boundaries. We also introduce a fast power estimation technique, based on switching activity matrices, to drive the synthesis process. Experimental results for a number of control-flow intensive and data-dominated benchmarks demonstrate power reduction of up to 62% (58%) when compared to V/sub dd/-scaled area-optimized (delay-optimized) designs. The area overheads over area-optimized designs are less than 39%, whereas the area savings over delay-optimized designs are up to 40%.

design automation conference | 2000

Communication architecture tuners: a methodology for the design of high-performance communication architectures for systems-on-chips

Kanishka Lahiri; Anand Raghunathan; Ganesh Lakshminarayana; Sujit Dey

In this chapter, we present a general methodology for the design of custom system-on-chip communication architectures. Our technique is based on the addition of a layer of circuitry, called the Communication Architecture Tuner (CAT), around any existing communication architecture topology. The added layer enhances the ability of the system to adapt to changing communication needs of its constituent components. For example, more critical data may be handled differently, leading to lower communication latencies. The CAT monitors the internal state and communication transactions of each component, and “predicts” the relative importance of each communication transaction in terms of its potential impact on different system-level performance metrics. It then configures the protocol parameters of the underlying communication architecture (e.g., priorities, DMA modes,etc.) to best suit the systems changing communication needs. We illustrate issues and tradeoffs involved in the design of CAT-based communication architectures, and present algorithms to automate the key steps. Experimental results indicate that performance metrics (e.g. number of missed deadlines, average processing time) for systems with CAT-based communication architectures are significantly (sometimes, over an order of magnitude) better than those with conventional communication architectures.

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems | 1999

Wavesched: a novel scheduling technique for control-flow intensive designs

Ganesh Lakshminarayana; Kamal S. Khouri; Niraj K. Jha

In this paper, we present a novel scheduling algorithm targeted toward minimizing the average execution time of control-flow intensive behavioral descriptions. Our algorithm uses a control/data flow graph model, which preserves the parallelism inherent in the application. It explores previously unexplored regions of the solution space by its ability to overlap the schedules of independent iterative constructs, whose bodies share resources. It also incorporates well known optimization techniques like loop unrolling in a natural fashion. This is made possible by a general loop-handling technique, which we have devised. Application of the algorithm to several common benchmarks demonstrates up to 4.8-fold improvement in expected schedule length over existing scheduling algorithms, without paying a price in terms of the best and worst case schedule lengths required to execute the behavioral description (in fact, frequently, the best/worst case schedule lengths are also better for our algorithm).

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems | 2001

Testing of core-based systems-on-a-chip

Srivaths Ravi; Ganesh Lakshminarayana; Niraj K. Jha

Available techniques for testing of core-based systems-on-a-chip (SOCs) do not provide a systematic means for synthesizing low-overhead test architectures and compact test solutions. In this paper, we provide a comprehensive framework that generates low-overhead compact test solutions for SOCs. First, we develop a common ground for addressing issues such as core test requirements, core access, and testing hardware additions. For this purpose, we introduce finite-state automata (FSA) for modeling tests, transparency modes, and testing hardware behavior. In many cases, the tests repeat a basic set of test actions for different test data that can again be modeled using FSA. While earlier work can derive a single symbolic test for a module in a register-transfer level (RTL) circuit as a finite-state automaton, this work extends the methodology to the system level and additionally contributes a satisfiability-based solution to the problem of applying a sequence of tests phased in time. This problem is known to be a bottleneck in testability analysis not only at the system level, but also at the RTL. Experimental results show that the system-level average area overhead for making SOCs testable with our method is only 4.5%, while achieving an average test application time reduction of 80% over recent approaches. At the same time, it provides 100% test coverage of the precomputed test sets/sequences of the embedded cores.

design automation conference | 1999

Common-case computation: a high-level technique for power and performance optimization

Ganesh Lakshminarayana; Anand Raghunathan; Kamal S. Khouri; Niraj K. Jha; Sujit Dey

This paper presents a design methodology, called common-case computation (CCC), and new design automation algorithms for optimizing power consumption or performance. The proposed techniques are applicable in conjunction with any high-level design methodology where a structural register-transfer level (RTL) description and its corresponding scheduled behavioral (cycle-accurate functional RTL) description are available. It is a well-known fact that in behavioral descriptions of hardware (also in software), a small set of computations (CCCs) often accounts for most of the computational complexity. However, in hardware implementations (structural RTL or lower level), CCCs and the remaining computations a typically treated alike. This paper shows that identifying and exploiting CCCs during the design process can lead to implementations that are much more efficient in terms of power consumption or performance. We propose a CCC-based high-level design methodology with the following steps: extraction of common-case behaviors and execution conditions from the scheduled description, simplification of the common-case behaviors in a stand-alone manner, synthesis of common-case detection and execution circuits from the common-case behaviors, and composing the original design with the common-case circuits, resulting in a CCC-optimized design. We demonstrate that CCC-optimized designs reduce power consumption by up to 91.5%, or improve performance by up to 76.6% compared to designs derived without special regard for CCCs.

international conference on computer aided design | 1997

Wavesched : a novel scheduling technique for control-flow intensive behavioral descriptions

Ganesh Lakshminarayana; Kamal S. Khouri; Niraj K. Jha

An exercising machine having a frame including a pair of side members and a seat slidably resting upon the side members. A brake shaft attached to the frame. A friction producer slidably engages around the brake shaft. A member for reciprocating the seat on the rails while simultaneously reciprocating the friction producer along the brake shaft when physically operated by the user of the exercising machine.

Explore More