Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Ismail Kadayif is active.

Publication


Featured researches published by Ismail Kadayif.


design automation conference | 2001

Dynamic management of scratch-pad memory space

Mahmut T. Kandemir; J. Ramanujam; J. Irwin; Narayanan Vijaykrishnan; Ismail Kadayif; A. Parikh

Optimizations aimed at improving the efficiency of on-chip memories are extremely important. We propose a compiler-controlled dynamic on-chip scratch-pad memory (SPM) management framework that uses both loop and data transformations. Experimental results obtained using a generic cost model indicate significant reductions in data transfer activity between SPM and off-chip memory.


international conference on parallel architectures and compilation techniques | 2002

Leakage energy management in cache hierarchies

Lin Li; Ismail Kadayif; Yuh-Fang Tsai; Narayanan Vijaykrishnan; Mahmut T. Kandemir; Mary Jane Irwin; Anand Sivasubramaniam

Energy management is important for a spectrum of systems ranging from high-performance architectures to low-end mobile and embedded devices. With the increasing number of transistors, smaller feature sizes, lower supply and threshold voltages, the focus on energy optimization is shifting from dynamic to leakage energy. Leakage energy is of particular concern in dense cache memories that form a major portion of the transistor budget. In this work, we present several architectural techniques that exploit the data duplication across the different levels of cache hierarchy. Specifically, we employ both state-preserving (data-retaining) and state-destroying leakage control mechanisms to L2 subblocks when their data also exist in L1. Using a set of media and array-dominated applications, we demonstrate the effectiveness of the proposed techniques through cycle-accurate simulation. We also compare our schemes with the previously proposed cache decay policy. This comparison indicates that one of our schemes generates competitive results with cache decay.


IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems | 2004

A compiler-based approach for dynamically managing scratch-pad memories in embedded systems

Mahmut T. Kandemir; J. Ramanujam; Mary Jane Irwin; Narayanan Vijaykrishnan; Ismail Kadayif; A. Parikh

Optimizations aimed at improving the efficiency of on-chip memories in embedded systems are extremely important. Using a suitable combination of program transformations and memory design space exploration aimed at enhancing data locality enables significant reductions in effective memory access latencies. While numerous compiler optimizations have been proposed to improve cache performance, there are relatively few techniques that focus on software-managed on-chip memories. It is well-known that software-managed memories are important in real-time embedded environments with hard deadlines as they allow one to accurately predict the amount of time a given code segment will take. In this paper, we propose and evaluate a compiler-controlled dynamic on-chip scratch-pad memory (SPM) management framework. Our framework includes an optimization suite that uses loop and data transformations, an on-chip memory partitioning step, and a code-rewriting phase that collectively transform an input code automatically to take advantage of the on-chip SPM. Compared with previous work, the proposed scheme is dynamic, and allows the contents of the SPM to change during the course of execution, depending on the changes in the data access pattern. Experimental results from our implementation using a source-to-source translator and a generic cost model indicate significant reductions in data transfer activity between the SPM and off-chip memory.


design automation conference | 2002

An integer linear programming based approach for parallelizing applications in on-chip multiprocessors

Ismail Kadayif; Mahmut T. Kandemir; Ugur Sezer

With energy consumption becoming one of the first-class optimization parameters in computer system design, compilation techniques that consider performance and energy simultaneously are expected to play a central role. In particular, compiling a given application code under performance and energy constraints is becoming an important problem. In this paper, we focus on an on-chip multiprocessor architecture and present a parallelization strategy based on integer linear programming. Given an array-intensive application, our optimization strategy determines the number of processors to be used in executing each nest based on the objective function and additional compilation constraints provided by the user. Our initial experience with this strategy shows that it is very successful in optimizing array-intensive applications on on chip multiprocessors under energy and performance constraints.


workshop on program analysis for software tools and engineering | 2001

vEC: virtual energy counters

Ismail Kadayif; T. Chinoda; Mahmut T. Kandemir; N. Vijaykirsnan; Mary Jane Irwin; Anand Sivasubramaniam

Energy has become a critical issue in processor design, especially in embedded environments. Thus, there is a need for tools, which provide an accurate and fast estimation of energy. In this paper, we present the design and use of a tool, Virtual Energy Counters (vEC), for estimating the energy consumption of user programs. vEC is built on top of the Perfmon user library for the UltraSPARC platform, and provides a user interface, which can be used within user programs to estimate the energy consumption. The energy estimates are provided for those consumed in the data, instruction and extended caches, main memory, address bus, data bus, address pads, and data pads.


international symposium on systems synthesis | 2001

Exploiting scratch-pad memory using Presburger formulas

Mahmut T. Kandemir; Ismail Kadayif; Ugur Sezer

Effective utilization of on-chip storage space is important from both performance (execution cycles) and memory system energy consumptions perspectives. While on-chip cache memories have been widely used in the past, several factors, including lack of data access time predictability and limited effectiveness of compiler optimizations, indicate that they may not be the best candidate for portable/embedded devices. This paper presents a compiler-directed on-chip scratch-pad memory (software-managed on-chip memory) management strategy for data accesses. Our strategy is oriented towards minimizing the number of data transfers between off-chip memory and the scratch-pad memory, thereby exploiting reuse for the data residing in the scratch-pad memory. We report experimental data from our implementation showing the usefulness of our technique.


design, automation, and test in europe | 2004

Exploiting processor workload heterogeneity for reducing energy consumption in chip multiprocessors

Ismail Kadayif; Mahmut T. Kandemir; Ibrahim Kolcu

Advances in semiconductor technology are enabling designs with several hundred million transistors. Since building sophisticated single processor based systems is a complex process from design, verification, and software development perspectives, the use of chip multiprocessing is inevitable in future microprocessors. In fact, the abundance of explicit loop-level parallelism in many embedded applications helps us identify chip multiprocessing as one of the most promising directions in designing systems for embedded applications. Another architectural trend that we observe in embedded systems, namely, multi-voltage processors, is driven by the need of reducing energy consumption during program execution. Practical implementations such as Transmetas Crusoe and Intels XScale tune processor voltage/frequency depending on current execution load. Considering these two trends, chip multiprocessing and voltage/frequency scaling, this paper presents an optimization strategy for an architecture that makes use of both chip parallelism and voltage scaling. In our proposal, the compiler takes advantage of heterogeneity in parallel execution between the loads of different processors and assigns different voltages/frequencies to different processors if doing so reduces energy consumption without increasing overall execution cycles significantly. Our experiments with a set of applications show that this optimization can bring large energy benefits without much performance loss.


international symposium on low power electronics and design | 2004

Compiler-directed scratch pad memory optimization for embedded multiprocessors

Mahmut T. Kandemir; Ismail Kadayif; Alok N. Choudhary; J. Ramanujam; Ibrahim Kolcu

This paper presents a compiler strategy to optimize data accesses in regular array-intensive applications running on embedded multiprocessor environments. Specifically, we propose an optimization algorithm that targets at reducing extra off-chip memory accesses caused by interprocessor communication. This is achieved by increasing the application-wide reuse of data that resides in scratch-pad memories of processors. Our results obtained using four array-intensive image processing applications indicate that exploiting interprocessor data sharing can reduce energy-delay product significantly on a four-processor embedded system.


compiler construction | 2002

Influence of Loop Optimizations on Energy Consumption of Multi-bank Memory Systems

Mahmut T. Kandemir; Ibrahim Kolcu; Ismail Kadayif

It is clear that automatic compiler support for energy optimization can lead to better embedded system implementations with reduced design time and cost. Efficient solutions to energy optimization problems are particularly important for array-dominated applications that spend a significant portion of their energy budget in executing memory-related operations. Recent interest in multi-bank memory architectures and low-power operating modes motivates us to investigate whether current locality-oriented loop-level transformations are suitable from an energy perspective in a multi-bank architecture, and if not, how these transformations can be tuned to take into account the banked nature of the memory structure and the existence of low-power modes. In this paper, we discuss the similarities and conflicts between two complementary objectives, namely, optimizing cache locality and reducing memory system energy, and try to see whether loop transformations developed for the former objective can also be used for the latter. To test our approach, we have implemented bank-conscious versions of three loop transformation techniques (loop fission/fusion, linear loop transformations and loop tiling) using an experimental compiler infrastructure and measured the energy benefits using nine array-dominated codes. Our results show that the modified (memory bank-aware) loop transformations result in large energy savings in both cacheless and cache-based systems, and that the execution times of the resulting codes are competitive with those obtained using pure locality-oriented techniques in a cache-based system.


international symposium on microarchitecture | 2002

Generating physical addresses directly for saving instruction TLB energy

Ismail Kadayif; Anand Sivasubramaniam; Mahmut T. Kandemir; Gokul B. Kandiraju; Guangyu Chen

Power consumption and power density for the Translation Lookaside Buffer (TLB) are important considerations not only in its design, but can have a consequence on cache design as well. This paper embarks on a new philosophy for reducing the number of accesses to the instruction TLB (iTLB) for power and performance optimizations. The overall idea is to keep a translation currently being used in a register and avoid going to the iTLB as far as possible - until there is a page change. We propose four different approaches for achieving this, and experimentally demonstrate that one of these schemes that uses a combination of compiler and hardware enhancements can reduce iTLB dynamic power by over 85% in most cases. These mechanisms can work with different instruction-cache (iLl) lookup mechanisms and achieve significant iTLB power savings without compromising on performance. Their importance grows with higher iLl miss rates and larger page sizes. They can work very well with large iTLB structures, that can possibly consume more power and take longer to lookup, without the iTLB getting into the common case. Further, we also experimentally demonstrate that they can provide performance savings for virtually-indexed, virtually-tagged iLl caches, and can even make physically-indexed, physically-tagged iLl caches a possible choice for implementation.

Collaboration


Dive into the Ismail Kadayif's collaboration.

Top Co-Authors

Avatar

Mahmut T. Kandemir

Pennsylvania State University

View shared research outputs
Top Co-Authors

Avatar

Mary Jane Irwin

Pennsylvania State University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Anand Sivasubramaniam

Pennsylvania State University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Ugur Sezer

University of Wisconsin-Madison

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Ibrahim Kolcu

University of Manchester

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Guangyu Chen

Pennsylvania State University

View shared research outputs
Researchain Logo
Decentralizing Knowledge