Chidamber Kulkarni
IMEC
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Chidamber Kulkarni.
Vlsi Design | 2001
Koen Danckaert; Chidamber Kulkarni; Francky Catthoor; Hugo De Man; Vivek Tiwari
Multimedia algorithms deal with enormous amounts of data transfers and storage, resulting in huge bandwidth requirements at the off-chip memory and system bus level. As a result the related energy consumption becomes critical. Even for execution time the bottleneck can shift from the CPU to the external bus load. This paper demonstrates a systematic software approach to reduce this system bus load. It consists of source-to-source code transformations, that have to be applied before the conventional ILP compilation. To illustrate this we use a cavity detection algorithm for medical imaging, that is mapped on an Intel Pentium® II processor.
IEEE Design & Test of Computers | 2001
Lode Nachtergaele; Francky Catthoor; Chidamber Kulkarni
This tutorial covers the basic design choices involved in customized data storage, including those for register files, local memory, caches, and main memory.
european conference on parallel processing | 1999
Chidamber Kulkarni; Koen Danckaert; Francky Catthoor; Manish Gupta
Real-time multi-media applications need large processing power and yet require a low-power implementation in an embedded context. For programmable parallel processors, this poses new challenges for optimizing a given application for high-performance and low-power. In this paper, we present a case study of applying our low-power oriented data transfer and storage exploration methodology and coupling it with a state-of-the-art performance optimizing and parallelizing compiler. Experiments on two real-life applications show that this combined approach heavily reduces the memory accesses and bus-loading and hence power. At the same time a significant reduction in the total execution time is obtained. Decomposing the detailed parallelization and data transfer and storage exploration issues into two different stages is required to obtain the important benefits of both the stages without exploding the complexity of solving all the issues simultaneously. This will be demonstrated by the experimental results.
Archive | 2002
Francky Catthoor; Koen Danckaert; Chidamber Kulkarni; Erik Brockmeyer; Per Gunnar Kjeldsberg; Tanja Van Achteren; Thierry Omnes
This introductory chapter will contain the problem context that we focus on, including the target application domain and architectural style. Then the global approach for reducing the cost of data access and storage is described. Next, a summary is provided of all the steps in the Data Transfer and Storage Exploration (DTSE) script. Finally the content of the rest of the book is briefly outlined.
Archive | 2002
Francky Catthoor; Koen Danckaert; Chidamber Kulkarni; Erik Brockmeyer; Per Gunnar Kjeldsberg; Tanja Van Achteren; Thierry Omnes
In this chapter, an extensive summary is provided of the main related compiler work in the domain of this book. It is organized in a hierarchical way where the most important topics receive a separate discussion, with pointers to the available literature. Wherever needed, a further subdivision in subsections or paragraphs is made.
Archive | 2002
Francky Catthoor; Koen Danckaert; Chidamber Kulkarni; Erik Brockmeyer; Per Gunnar Kjeldsberg; Tanja Van Achteren; Thierry Omnes
In many cases, a fully customized (on-chip) memory architecture can give superior memory bandwidth and power characteristics over traditional hierarchical memory architecture including data caches. This is especially so when the application is very well analyzable at compile-time. In an embedded context this is typically quite well achievable because the application (set) to be mapped is usually fully fixed and the on-chip memory organisation can be at least partly tuned towards this application (set).
Archive | 2002
Francky Catthoor; Koen Danckaert; Chidamber Kulkarni; Erik Brockmeyer; Per Gunnar Kjeldsberg; Tanja Van Achteren; Thierry Omnes
Estimators at the system-level are crucial to help the designer in making global design decisions and trade-offs. For data-dominant applications in the multi-media and telecom domains, the system-level description is typically characterized by large multi-dimensional loop nests and arrays. A major aspect of system cost related to such codes is due to the data transfer and storage (DTS) aspects, as motivated in chapter 1. Cost models for the amount of area per memory cell or the energy consumption per access for a given memory plane size can be obtained from vendors. However, in order to identify the amount of memory accesses or data transfers and the required memory size, automatable estimation techniques are required. Effective approaches for this are described in this chapter.
Archive | 2002
Francky Catthoor; Koen Danckaert; Chidamber Kulkarni; Erik Brockmeyer; Per Gunnar Kjeldsberg; Tanja Van Achteren; Thierry Omnes
As motivated, the reorganisation of the loop structure and the global control flow across the entire application is a crucial initial step in the DTSE flow Experiments have shown that this is extremely difficult to decide manually due to the many conflicting goals and trade-offs that exist in modern real-life multi-media applications. So an interactive transformation environment would help but is not sufficient. Therefore, we have devoted a major research effort since 1989 to derive automatic steering techniques in the DTSE context where both “global” access locality and access regularity are crucial.
Archive | 2002
Francky Catthoor; Koen Danckaert; Chidamber Kulkarni; Erik Brockmeyer; Per Gunnar Kjeldsberg; Tanja Van Achteren; Thierry Omnes
Architectural techniques for reducing cache misses are expensive to implement and they do not have a global view of the complete program which limits their effectiveness. Thus compiler optimizations are the most attractive alternative which can overcome both the above shortcomings of a hardware implementation. In this chapter, we will investigate the current state-of-the-art in compiler optimizations for caching. Afterwards, we propose a stepwise methodology which allows a designer to perform a global optimization of the program for a given cache organization. Then each of the main cache related steps will be studied in more detail including both problem formulations and techniques to solve them. The effectiveness of these automatable techniques will be substantiated by realistic demonstrators.
Archive | 2002
Francky Catthoor; Koen Danckaert; Chidamber Kulkarni; Erik Brockmeyer; Per Gunnar Kjeldsberg; Tanja Van Achteren; Thierry Omnès