Preeti Ranjan Panda
Indian Institute of Technology Delhi
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Preeti Ranjan Panda.
ACM Transactions on Design Automation of Electronic Systems | 2001
Preeti Ranjan Panda; Francky Catthoor; Nikil D. Dutt; Koen Danckaert; Erik Brockmeyer; Chidamber Kulkarni; A Vandercappelle; Per Gunnar Kjeldsberg
We present a survey of the state-of-the-art techniques used in performing data and memory-related optimizations in embedded systems. The optimizations are targeted directly or indirectly at the memory subsystem, and impact one or more out of three important cost metrics: area, performance, and power dissipation of the resulting implementation. We first examine architecture-independent optimizations in the form of code transoformations. We next cover a broad spectrum of optimization techniques that address memory architectures at varying levels of granularity, ranging from register files to on-chip memory, data caches, and dynamic memory (DRAM). We end with memory addressing related issues.
european design and test conference | 1997
Preeti Ranjan Panda; Nikil D. Dutt; Alexandru Nicolau
Efficient utilization of on-chip memory space is extremely important in modern embedded system applications based on microprocessor cores. In addition to a data cache that interfaces with slower off-chip memory, a fast on-chip SRAM, called Scratch-Pad memory, is often used in several applications. We present a technique for efficiently exploiting on-chip Scratch-Pad memory by partitioning the applications scalar and array variables into off-chip DRAM and on-chip Scratch-Pad SRAM, with the goal of minimizing the total execution time of embedded applications. Our experiments on code kernels from typical applications show that our technique results in significant performance improvements.
ACM Transactions on Design Automation of Electronic Systems | 2000
Preeti Ranjan Panda; Nikil D. Dutt; Alexandru Nicolau
Efficient utilization of on-chip memory space is extremely important in modern embedded system applications based on processor cores. In addition to a data cache that interfaces with slower off-chip memory, a fast on-chip SRAM, called Scratch-Pad memory, is often used in several applications, so that critical data can be stored there with a guaranteed fast access time. We present a technique for efficiently exploiting on-chip Scratch-Pad memory by partitioning the applications scalar and arrayed variables into off-chip DRAM and on-chip Scratch-Pad SRAM, with the goal of minimizing the total execution time of embedded applications. We also present extensions of our proposed memory assignment strategy to handle context switching between multiple programs, as well as a generalized memory hierarchy. Our experiments on code kernels from typical applications show that our technique results in significant performance improvements.
international symposium on systems synthesis | 2001
Preeti Ranjan Panda
SystemC is a C++ based modeling platform supporting design abstractions at the register-transfer, behavioral, and system levels. Consisting of a class library and a simulation kernel, the language is an attempt at standardization of a C/C++ design methodology, and is supported by the Open SystemC Initiative (OSCI), a consortium of a wide range of system houses, semiconductor companies, intellectual property (IP) providers, embedded software developers, and design automation tool vendors. The advantages of SystemC include the establishment of a common design environment consisting of C++ libraries, models and tools, thereby setting up a foundation for hardware-software co-design; the ability to exchange IP easily and efficiently; and the ability to reuse test benches across different levels of modeling abstraction. We outline the features of SystemC that make it an attractive language for design specification, verification, and synthesis at different levels of abstraction, with particular emphasis on the new features included in SystemC 2.0 that support system-level design.
IEEE Transactions on Computers | 1999
Preeti Ranjan Panda; Hiroshi Nakamura; Nikil D. Dutt; Alexandru Nicolau
Loop blocking (tiling) is a well-known compiler optimization that helps improve cache performance by dividing the loop iteration space into smaller blocks (tiles); reuse of array elements within each tile is maximized by ensuring that the working set for the tile fits into the data cache. Padding is a data alignment technique that involves the insertion of dummy elements into a data structure for improving cache performance. In this work, we present DAT, a technique that augments loop tiling with data alignment, achieving improved efficiency (by ensuring that the cache is never under-utilized) as well as improved flexibility (by eliminating self-interference cache conflicts independent of the tile size). This results in a more stable and better cache performance than existing approaches, in addition to maximizing cache utilization, eliminating self-interference, and minimizing cross-interference conflicts. Further, while all previous efforts are targeted at programs characterized by the reuse of a single array, we also address the issue of minimizing conflict misses when several tiled arrays are involved. To validate our technique, we ran extensive experiments using both simulations as well as actual measurements on SUN Sparc5 and Sparc10 workstations. The results on benchmarks exhibiting varying memory access patterns demonstrate the effectiveness of our technique through consistently high hit ratios and improved performance across varying problem sizes.
international symposium on systems synthesis | 1995
Preeti Ranjan Panda; Nikil D. Dutt
Abstract: In this paper we briefly describe a set of designs that earn serve as examples for high level synthesis (HLS) systems. The designs vary in complexity from simple behavioral finite state machines to more complex designs such as microprocessors and floating point units. Most of the designs are described in the VHDL language at the behavioral level. We divide the designs into two categories. The first category contains designs that have documentation on the specifications of the designs along with the strategy used to test the individual design models. The second category contains examples used in many HLS papers, but lack comprehensive documentation and/or test vectors.
ACM Transactions on Design Automation of Electronic Systems | 1997
Preeti Ranjan Panda; Nikil D. Dutt; Alexandru Nicolau
Code generation for embedded processors opens up the possibility for several performance optimization techniques that have been ignored by traditional compilers due to compilation time constraints. We present techniques that take into account the parameters of the data caches for organizing scalar and array variables declared in embedded code into memory, with the objective of improving data cache performance. We present techniques for clustering variables to minimize compulsory cache misses, and for solving the memory assignment problem to minimize conflict cache misses. Our experiments with benchmark code kernels from DSP and other domains on the CW4001 embedded processor from LSI Logic indicate significant improvements in data cache performance by the application of our memory organization technique.
international symposium on systems synthesis | 1997
Preeti Ranjan Panda; Nikil D. Dutt; Alexandru Nicolau
Embedded processor-based systems allow for the tailoring of the on-chip memory architecture based on application-specific requirements. We present an analytical strategy for exploring the on-chip memory architecture for a given application, based on a memory performance estimation scheme. The analytical technique has the important advantage of enabling a fast evaluation of candidate memory architectures in the early stages of system design. Our experiments demonstrate that our estimations closely follow the actual simulated performance at significantly reduced run times.
european design and test conference | 1996
Preeti Ranjan Panda; Nikil D. Dutt
We present low power techniques for mapping arrays in behavioral specifications to physical memory, specifically for memory intensive behaviors that exhibit regularity in their memory access patterns. Our approach exploits this regularity in memory accesses by reducing the number of transitions on the memory address bus. We study the impact of different strategies for mapping arrays in behaviors to physical memory, on power dissipation during memory accesses. We describe a heuristic for selecting a memory mapping strategy to achieve low power, and present an evaluation of the architecture that implements the mapping techniques to study the transition count overhead. Experiments on several image processing benchmarks indicate power savings of upto 63% through reduced transition activity on the memory address bus.
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems | 1999
Preeti Ranjan Panda; Nikil D. Dutt; Alexandru Nicolau
Embedded processor-based systems allow for the tailoring of the on-chip memory architecture based on application specific requirements. We present an analytical strategy for exploring the on-chip memory architecture for a given application, based on a memory performance estimation scheme. The analytical technique has the important advantage of enabling a fast evaluation of candidate memory architectures in the early stages of system design. Many digital signal-processing applications involve array accesses and loop nests that can benefit from such an exploration. Our experiments demonstrate that our estimations closely follow the actual simulated performance at significantly reduced run times.