Ashok Sudarsanam | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ashok Sudarsanam is active.

Explore More

Publication

Featured researches published by Ashok Sudarsanam.

international conference on computer aided design | 1995

Memory bank and register allocation in software synthesis for ASIPs

Ashok Sudarsanam; Sharad Malik

An architectural feature commonly found in digital signal processors (DSPs) is multiple data-memory banks. This feature increases memory bandwidth by permitting multiple memory accesses to occur in parallel when the referenced variables belong to different memory banks and the registers involved are allocated according to a strict set of conditions, Unfortunately, current compiler technology is unable to take advantage of the potential increase in parallelism offered by such architectures, Consequently, most application software for DSP systems is hand-written-a very time-consuming task. We present an algorithm which attempts to maximize the benefit of this architectural feature. While previous approaches have decoupled the phases of register allocation and memory bank assignment, our algorithm performs these two phases simultaneously. Experimental results demonstrate that our algorithm substantially improves the code quality of many compiler-generated and even hand-written programs.

design automation conference | 1997

Analysis and evaluation of address arithmetic capabilities in custom DSP architectures

Ashok Sudarsanam; Stan Y. Liao; Srinivas Devadas

Many application-specific architectures provideindirect addressing modes with auto-increment/decrementarithmetic.Since these architectures generally do not featurean indexed addressing mode, stack-allocated variablesmust be accessed by allocating address registers and performingaddress arithmetic.Subsuming address arithmeticinto auto-increment/decrement arithmetic improves boththe performance and size of the generated code.Our objective in this paper is to provide a method forcomprehensively analyzing the performance benefits andhardware cost due to an auto-increment/decrement featurethat varies from -l to +l, and allowing access to k addressregisters in an address generator.We provide this methodvia a parameterizable optimization algorithm that operateson a procedure-wise basis.Hence, the optimizationtechniques in a compiler can be used not only to generateefficient or compact code, but also to help the designerof a custom DSP architecture make decisions on addressarithmetic featuers.We present two sets of experimental results based onselected benchmark programs: (1) the values of l and kbeyond which there is little or no improvement in performance,and (2) the values of l and k which result in minimumcode area.

ACM Transactions on Design Automation of Electronic Systems | 2000

Simultaneous reference allocation in code generation for dual data memory bank ASIPs

Ashok Sudarsanam; Sharad Malik

We address the problem of code generation for DSP systems on a chip. In such systems, the amount of silicon devoted of program ROM is limited, so application software must be sufficiently dense. Additionally, the software must be written so as to meet various high-performance constraints, which may include hard real-time constraints. Unfortunately, current compiler technology is unable to generate high-quality code for DSPs, whose architectures are highly irregular. Thus, designers often resort to programming application software in assembly—a time-consuming task.In this paper, we focus on providing support for architectural feature of DSPs that makes code generation difficult, namely multiple data memory banks. This feature increases memory bandwith by permitting multiple data memory accesses to occur in parallel when the referenced variables belong to different data memory banks and the registers involved conform to a strict set of conditions. We present an algorithm that attempst to maximize the benefit of this architectural feature. While previous approaches have decoupled the phases of register allocation and memory bank assignment, thereby compromising code quality, our algorithm performs these two phases simultaneously. Experimental results demonstrate that our algorithm not only generates high-quality compiled code, but also improves the quality of completely-referenced code.

Design Automation for Embedded Systems | 1999

Analysis and Evaluation of Address Arithmetic Capabilities in Custom DSP Architectures

Ashok Sudarsanam; Stan Y. Liao; Srinivas Devadas

We address the problem of code generation for DSP systems on a chip. In such systems, the amount of silicon devoted to program ROM is limited, so in addition to meeting various high-performance constraints, the application software must be sufficiently dense. Unfortunately, existing compiler technology is unable to generate high-quality code for DSPs since it does not provide adequate support for the specialized architectural features of DSPs. Thus, designers often resort to programming application software in assembly, which is a very tedious and time-consuming task. In this paper, we focus on providing compiler support for a group of specialized architectural features that exist in many DSPs, namely indirect addressing modes with auto-increment/decrement arithmetic. In these DSPs, an indexed addressing mode is generally not available, so automatic variables must be accessed by allocating address registers and performing address arithmetic. Subsuming address arithmetic into auto-increment /decrement arithmetic improves both the performance and size of the generated code. Our objective is to provide a method for comprehensively analyzing the performance benefits and hardware cost due to an auto-increment /decrement feature that varies from-l to +l, and allowing access to k address registers in an address generator. We provide this method via a parameterizable optimization algorithm that operates on a procedure-wise basis. Thus, the optimization techniques in a compiler can be used not only to generate efficient or compact code, but also to help the designer of a custom DSP architecture make decisions on address arithmetic features.

Archive | 1996

Code Generation and Optimization Techniques for Embedded Digital Signal Processors

Stan Y. Liao; Srinivas Devadas; Kurt Keutzer; Steve Tjiang; Albert R. Wang; Guido Araujo; Ashok Sudarsanam; Sharad Malik; Vojin Živojnović; Heinrich Meyr

The advent of 0.5μ processing that allows for the integration of 5 million transistors on a single integrated circuit has brought forth new challenges and opportunities in embedded-system design. This high level of integration makes it possible and desirable to integrate a processor core, a program ROM, and an ASIC together on a single IC. To justify the design costs of such an IC, these embedded-system designs must be sold in large volumes and, as a result, they are very cost-sensitive. The cost of an IC is most closely linked to its size, which is derived from the final circuit area. It is not unusual for the ROM that stores the program code to be the largest contributor to the area of such ICs. Thus the incremental value of using logic optimization to reduce the size of the ASIC is smaller because the ASIC takes up a relatively smaller percentage of the final circuit area. On the other hand, the potential for cost reduction through diminishing the size of the program ROM is great. There are also often strong real-time performance requirements on the final code; hence, there is a necessity for producing high-performance code as well.

international symposium on systems synthesis | 1996

Instruction set design and optimizations for address computation in DSP architectures

Guido Araujo; Ashok Sudarsanam; Sharad Malik

In this paper we investigate the problem of code generation for address computation for DSP processors. This work is divided into four parts. First, we propose a branch instruction design which can guarantee minimum overhead for programs that make use of implicit indirect addressing. Second, we give a formulation and propose a solution for the problem of allocating address registers (ARs) for array accesses within loop constructs. Third, we describe retargetable approaches for auto-increment (decrement) optimizations of pointer variables, and loop induction variables. Finally, we use a graph coloring technique to allocate physical ARs to the virtual ARs used in the previous phases. The results show that the combination of the above techniques considerably improves the final code quality for benchmark DSP programs.

Code Generation for Embedded Processors | 2002

Challenges in Code Generation for Embedded Processors

Guido Araujo; Srinivas Devadas; Kurt Keutzer; Stan Y. Liao; Sharad Malik; Ashok Sudarsanam; Steven W. K. Tjiang; Albert R. Wang

The emergence of integrated circuits in which both the program-ROM and the processor are integrated on a single die initiates a new era of problems for programming language compilers. In such a micro-architecture, code performance, and particularly code density, gain an unprecedented level of importance and new code-optimization algorithms will be required to supply the required code quality. This paper presents the first wave of a variety of new code-optimization approaches aimed at supplying the highest code quality possible.

Proceedings of the Seventh International Workshop on Hardware/Software Codesign (CODES'99) (IEEE Cat. No.99TH8450) | 1999

Development of an optimizing compiler for a Fujitsu fixed-point digital signal processor

Sreeranga P. Rajan; Masahiro Fujita; Ashok Sudarsanam; Sharad Malik

A common design methodology for embedded DSP systems is the integration of one or more digital signal processors (DSPs), program memory, and ASIC circuitry onto a single IC. Consequently, program memory size being limited, the criterion for optimality is that the embedded software must be very dense. We describe the development of an optimizing compiler, based on a retargetable compiler infrastructure, for the Fujitsu Elixir, a fixed-point DSP that is primarily used in cellular telephones. For small DSP benchmark programs (25-90 lines of C code), the average ratio of the size of compiler-generated code to the size of hand-written assembly code is 1.18. For a much larger program (more than 800 lines of C code), the ratio of the size of compiled code to the size of hand-written code is similar (1.14).

ACM Sigarch Computer Architecture News | 1994

The effect of compiler-flag tuning on SPEC benchmark performance

Yin Chan; Ashok Sudarsanam; Andrew Wolfe

The SPEC CINT92 and CFP92 benchmark suites are application-based system benchmarks primarily intended for workstation-class system performance measurements. The SPEC CPU benchmark results are widely disseminated by system vendors and as such have become the de-facto standard for comparing system performance. Recently, many observers have expressed concerns about the suitability of published SPEC benchmark results in representing application performance on typical systems. The most outspoken concern is that there is too much freedom permitted in the manipulation of compiler flags. This has resulted in revisions to the SPEC reporting procedure.This paper presents and discusses many of the issues concerning the tuning of benchmarks through manipulation of compiler flags. We attempt to quantify the impact of these procedures through controlled experiments. Baseline performance results, using a set of uniform, common optimizations are compared to published data. Further experiments measure the performance of the SPEC benchmarks in the other common usage scenarios. These are a centralized file storage configuration and a system using common binaries among several implementations of the same architecture. Despite the great concern over the use of compiler flags in the SPEC community, our experiments show only a modest impact on performance. The more significant performance differential shown in the other experiments draws into question the utility of current SPEC data to many users.

Design Automation for Embedded Systems | 1999

A Retargetable Compilation Methodology for Embedded Digital Signal Processors Using a Machine-Dependent Code Optimization Library

Ashok Sudarsanam; Sharad Malik; Masahiro Fujita

We address the problem of code generation for embedded DSP systems. Such systems devote a limited quantity of silicon to program memory, so the embedded software must be sufficiently dense. Additionally, this software must be written so as to meet various high-performance constraints. Unfortunately, current compiler technology is unable to generate dense, high-performance code for DSPs, due to the fact that it does not provide adequate support for the specialized architectural features of DSPs via machine-dependent code optimizations. Thus, designers often program the embedded software in assembly, a very time-consuming task. In order to increase productivity, compilers must be developed that are capable of generating high-quality code for DSPs. The compilation process must also be made retargetable, so that a variety of DSPs may be efficiently evaluated for potential use in an embedded system. We present a retargetable compilation methodology that enables high-quality code to be generated for a wide range of DSPs. Previous work in retargetable DSP compilation has focused on complete automation, and this desire for automation has limited the number of machine-dependent optimizations that can be supported. In our efforts, we have given code quality higher priority over complete automation. We demonstrate how by using a library of machine-dependent optimization routines accessible via a programming interface, it is possible to support a wide range of machine-dependent optimizations, albeit at some cost to automation. Experimental results demonstrate the effectiveness of our methodology, which has been used to build good-quality compilers for three fixed-point DSPs.

Explore More