M. Balakrishnan
Indian Institute of Technology Delhi
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by M. Balakrishnan.
Proceedings of the Tenth International Symposium on Hardware/Software Codesign. CODES 2002 (IEEE Cat. No.02TH8627) | 2002
Rajeshwari Banakar; Stefan Steinke; Bo-Sik Lee; M. Balakrishnan; Peter Marwedel
In this paper we address the problem of on-chip memory selection for computationally intensive applications, by proposing scratch pad memory as an alternative to cache. Area and energy for different scratch pad and cache sizes are computed using the CACTI tool while performance was evaluated using the trace results of the simulator. The target processor chosen for evaluation was AT91M40400. The results clearly establish scratchpad memory as a low power alternative in most situations with an average energy reduction of 40%. Further the average area-time reduction for the scratchpad memory was 46% of the cache memory.
international symposium on systems synthesis | 2002
Stefan Steinke; Nils Grunwald; Lars Wehmeyer; Rajeshwari Banakar; M. Balakrishnan; Peter Marwedel
The number of mobile embedded systems is increasing and all of them are limited in their uptime by their battery capacity. Several hardware changes have been introduced during the last years, but the steadily growing functionality still requires further energy reductions, e.g. through software optimizations. A significant amount of energy can be saved in the memory hierarchy where most of the energy is consumed. In this paper, a new software technique is presented which supports the use of an onchip scratchpad memory by dynamically copying program parts into it. The set of selected program parts are determined with an optimal algorithm using integer linear programming. Experimental results show a reduction of the energy consumption by nearly 30%, a performance increase by 25% against a common cache system and energy improvements against a static approach of up to 38%.
international conference on vlsi design | 2001
Manoj Kumar Jain; M. Balakrishnan; Anshul Kumar
Interest in synthesis of Application Specific Instruction Processors or ASIPs has increased considerably and a number of methodologies have been proposed in the last decade. This paper attempts to survey the state of the art in this area and identifies some issues which need to be addressed. We have identified the five key steps in ASIP design as application analysis, architectural design space exploration, instruction set generation, code synthesis and hardware synthesis. A broad classification of the approaches reported in the literature is done. The paper notes the need to broaden the architectural space being explored and to tightly couple the various subtasks in ASIP synthesis.
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems | 1988
M. Balakrishnan; Arun K. Majumdar; Dilip K. Banerji; James G. Linders; Jayanti C. Majithia
An algorithm to synthesize registers using multiport memories during data-path synthesis is presented. The proposed approach considers not only the access requirements of registers but also their interconnection to operators in order to minimize required interconnections. The same approach can be applied to select the optimum number of buses in a multibus architecture. The method is illustrated with an example. >
ACM Transactions on Design Automation of Electronic Systems | 2007
Anup Gangwar; M. Balakrishnan; Anshul Kumar
VLIW processors have started gaining acceptance in the embedded systems domain. However, monolithic register file VLIW processors with a large number of functional units are not viable. This is because of the need for a large number of ports to support FU requirements, which makes them expensive and extremely slow. A simple solution is to break the register file into a number of smaller register files with a subset of FUs connected to it. These architectures are termed clustered VLIW processors. In this article, we first build a case for clustered VLIW processors with four or more clusters by showing that the achievable ILP in most of the media applications for a 16 ALU and 8 LD/ST VLIW processor is around 20. We then provide a classification of the intercluster interconnection design space, and show that a large part of this design space is currently unexplored. Next, using our performance evaluation methodology, we evaluate a subset of this design space and show that the most commonly used type of interconnection, RF-to-RF, fails to meet achievable performance by a large factor, while certain other types of interconnections can lower this gap considerably. We also establish that this behavior is heavily application dependent, emphasizing the importance of application-specific architecture exploration. We also present results about the statistical behavior of these different architectures by varying the number of clusters in our framework from 4 to 16. These results clearly show the advantages of one specific architecture over others. Finally, based on our results, we propose a new interconnection network, which should lower this performance gap.
international conference on vlsi design | 2000
T.V.K. Gupta; P. Sharma; M. Balakrishnan; Sharad Malik
In this paper we present a novel methodology for processor evaluation in an embedded systems design environment. This evaluation can help in either selecting a suitable processor core or in evaluating changes to an ASIP. The processor evaluation is carried out in two stages. First, an architecture independent stage in which processors are rejected based on key application parameters and secondary architecture dependent stage in which performance is estimated on selected processors. The contribution of our work includes identification of application parameters which can influence processor selection, a mechanism to capture widely varying processor architectures and an instruction constrained scheduler. Initial experimental results suggest the potential of this approach.
design automation conference | 1989
M. Balakrishnan; Peter Marwedel
Synthesis of digital systems, involves a number of tasks ranging from scheduling to generating interconnections. The interrelationship between these tasks implies that good designs can only be generated by considering the overall impact of a design decision. The approach presented in this paper provides a framework for integrating scheduling decisions with binding decisions. The methodology supports allocation of a wider mix of operator modules and covers the design space more effectively. The process itself can be described as incremental synthesis and is thus well-suited for applications involving partial pre-synthesized structures.
international conference on hardware/software codesign and system synthesis | 2004
Basant Kumar Dwivedi; Anshul Kumar; M. Balakrishnan
We present an approach for automatic synthesis of system on chip (SoC) multiprocessor architectures for applications expressed as process networks. Our approach is targeted towards design space exploration (DSE) and thus the speed of synthesis is of critical interest. The focus here is on the problem of resource allocation and binding with a view to optimize cost under performance constraints. Our approach exploits adjacency relation of processes and uses a dynamic programming based algorithm to synthesize the architecture including interconnection network. We have done a number of experiments on real as well as randomly generated process networks. The results have been compared with an optimal MILP formulation. They conclusively show that this approach is fast as well as effective and can be employed for DSE.
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems | 2001
Lars Wehmeyer; Manoj Kumar Jain; Stefan Steinke; Peter Marwedel; M. Balakrishnan
Interest in low-power embedded systems has increased considerably in the past few years. To produce low-power code and to allow an estimation of power consumption of software running on embedded systems, a power model was developed based on physical measurement using an evaluation board and integrated into a compiler and profiler. The compiler uses the power information to choose instruction sequences consuming less power, whereas the profiler gives information about the total power consumed during execution of the generated program. The used compiler is parameterized such that, e.g., the register file size may be changed. The resulting code is evaluated with respect to code size, performance, and power consumption for different register file sizes. The extracted information is especially useful during application analysis and architecture space exploration in application-specific integrated processor (ASIP) design. Our analysis gives the designer the ability to estimate the desirable register file size for an ASIP design. The size of the register file should be considered as a design parameter since it has a strong impact on the energy consumption of embedded systems.
international symposium on systems synthesis | 2002
Bhuvan Middha; Varun Raj; Anup Gangwar; Anshul Kumar; M. Balakrishnan; Paolo Ienne
It is widely accepted that use of an Application Specific Instruction Set Processor (ASIP) in an embedded system can provide a solution which is much more flexible than ASICs and much more efficient than standard processors in terms of performance and power consumption. However a lack of an acceptable design methodology and supporting tools for ASIPs limits their use even today. We present in this paper a methodology for design space exploration of high performance VLIW ASIPs by modeling Application Specific Functional Units in Trimaran Compiler Infrastructure. To demonstrate the effectiveness of our strategy we consider two important applications FFT and Kalman Filter and perform compute intensive operations in these applications via special Functional Units. The results we obtain are very promising with up to 2/spl times/ speed improvement.