Kamal S. Khouri | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Kamal S. Khouri is active.

Explore More

Publication

Featured researches published by Kamal S. Khouri.

IEEE Transactions on Very Large Scale Integration Systems | 2002

Leakage power analysis and reduction during behavioral synthesis

Kamal S. Khouri; Niraj K. Jha

This paper presents a high-level leakage power analysis and reduction algorithm. The algorithm uses device-level models for leakage to precharacterize a given register-transfer level module library. This is used to estimate the power consumption of a circuit due to leakage. The algorithm can also identify and extract the frequently idle modules in the datapath, which may be targeted for low-leakage optimization. Leakage optimization is based on the use of dual threshold voltage (V/sub T/) technology. The algorithm prioritizes modules giving a high-level synthesis system an indication of where most gains for leakage reduction may be found. We tested our algorithm using a number of benchmarks from various sources. We ran a series of experiments by integrating our algorithm into a low-power high-level synthesis system. In addition to reducing the power consumption due to switching activity, our algorithm provides the high-level synthesis system with the ability to detect and reduce leakage power consumption, hence, further reducing total power consumption. This is shown over a number of technology generations. The trend in these generations indicates that leakage becomes the dominant component of power at smaller feature size and lower supply voltages. Results show that using a dual-V/sub T/ library during high-level synthesis can reduce leakage power by an average of 58% for the different technology generations. Total power can be reduced by an average of 15.0%-45.0% for 0.18-0.07 /spl mu/m technologies, respectively. The contribution of leakage power to overall power consumption ranges from 22.6% to 56.2%. Our approach reduced these values to 11.7%-26.9%.

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems | 1999

High-level synthesis of low-power control-flow intensive circuits

Kamal S. Khouri; Ganesh Lakshminarayana; Niraj K. Jha

In this paper, we present a comprehensive high-level synthesis system that is geared toward reducing power consumption in control-flow intensive as well as data-dominated circuits. An iterative improvement framework allows the system to search the design space by examining the interaction between the different high-level synthesis tasks. In addition to incorporating traditional high-level synthesis tasks such as scheduling, module selection and resource sharing, we introduce a new optimization that performs power-conscious structuring of multiplexer networks, which are predominant in control-flow intensive circuits. The scheduler employed is capable of loop optimizations within and across loop boundaries. We also introduce a fast power estimation technique, based on switching activity matrices, to drive the synthesis process. Experimental results for a number of control-flow intensive and data-dominated benchmarks demonstrate power reduction of up to 62% (58%) when compared to V/sub dd/-scaled area-optimized (delay-optimized) designs. The area overheads over area-optimized designs are less than 39%, whereas the area savings over delay-optimized designs are up to 40%.

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems | 1999

Wavesched: a novel scheduling technique for control-flow intensive designs

Ganesh Lakshminarayana; Kamal S. Khouri; Niraj K. Jha

In this paper, we present a novel scheduling algorithm targeted toward minimizing the average execution time of control-flow intensive behavioral descriptions. Our algorithm uses a control/data flow graph model, which preserves the parallelism inherent in the application. It explores previously unexplored regions of the solution space by its ability to overlap the schedules of independent iterative constructs, whose bodies share resources. It also incorporates well known optimization techniques like loop unrolling in a natural fashion. This is made possible by a general loop-handling technique, which we have devised. Application of the algorithm to several common benchmarks demonstrates up to 4.8-fold improvement in expected schedule length over existing scheduling algorithms, without paying a price in terms of the best and worst case schedule lengths required to execute the behavioral description (in fact, frequently, the best/worst case schedule lengths are also better for our algorithm).

design automation conference | 1999

Common-case computation: a high-level technique for power and performance optimization

Ganesh Lakshminarayana; Anand Raghunathan; Kamal S. Khouri; Niraj K. Jha; Sujit Dey

This paper presents a design methodology, called common-case computation (CCC), and new design automation algorithms for optimizing power consumption or performance. The proposed techniques are applicable in conjunction with any high-level design methodology where a structural register-transfer level (RTL) description and its corresponding scheduled behavioral (cycle-accurate functional RTL) description are available. It is a well-known fact that in behavioral descriptions of hardware (also in software), a small set of computations (CCCs) often accounts for most of the computational complexity. However, in hardware implementations (structural RTL or lower level), CCCs and the remaining computations a typically treated alike. This paper shows that identifying and exploiting CCCs during the design process can lead to implementations that are much more efficient in terms of power consumption or performance. We propose a CCC-based high-level design methodology with the following steps: extraction of common-case behaviors and execution conditions from the scheduled description, simplification of the common-case behaviors in a stand-alone manner, synthesis of common-case detection and execution circuits from the common-case behaviors, and composing the original design with the common-case circuits, resulting in a CCC-optimized design. We demonstrate that CCC-optimized designs reduce power consumption by up to 91.5%, or improve performance by up to 76.6% compared to designs derived without special regard for CCCs.

international conference on computer aided design | 1997

Wavesched : a novel scheduling technique for control-flow intensive behavioral descriptions

Ganesh Lakshminarayana; Kamal S. Khouri; Niraj K. Jha

An exercising machine having a frame including a pair of side members and a seat slidably resting upon the side members. A brake shaft attached to the frame. A friction producer slidably engages around the brake shaft. A member for reciprocating the seat on the rails while simultaneously reciprocating the friction producer along the brake shaft when physically operated by the user of the exercising machine.

asia and south pacific design automation conference | 2007

LEAF: A System Level Leakage-Aware Floorplanner for SoCs

Aseem Gupta; Nikil D. Dutt; Fadi J. Kurdahi; Kamal S. Khouri; Magdy S. Abadir

Process scaling and higher leakage power have resulted in increased power densities and elevated die temperatures. Due to the interdependence of temperature and leakage power, we observe that the floorplan has an impact on both the temperatures and the leakage of the IP-blocks in a system on chip (SoC). Hence, in this paper we propose a novel system level leakage aware floorplanner (LEAF) which optimizes floorplans for temperature-aware leakage power along with the traditional metrics of area and wire length. Our floorplanner takes a SoC netlist and the dynamic power profile of functional blocks to determine a placement while optimizing for temperature dependent leakage power, area, and wire length. To demonstrate the effectiveness of LEAF, we implemented our methodology on ten industrial SoC designs from Freescale Semiconductor Inc. and evaluated the trade-off between leakage power and area. We observed up to 190% difference in the leakage power between leakage-unaware and leakage aware floorplanning.

design, automation, and test in europe | 1998

IMPACT: a high-level synthesis system for low power control-flow intensive circuits

Kamal S. Khouri; Ganesh Lakshminarayana; Niraj K. Jha

In this paper, we present a comprehensive high-level synthesis system that is geared towards reducing power consumption in control-flow intensive circuits. An iterative improvement algorithm is at the heart of the system. The algorithm searches the design space by handling scheduling, module selection, resource sharing and multiplexer network restructuring simultaneously. The scheduler performs concurrent loop optimization and implicit loop unrolling. It minimizes the expected number of cycles of the schedule without compromising on the minimum and maximum schedule lengths. A fast simulation technique based on trace manipulation aids power estimation in driving synthesis in the right direction. Experimental results demonstrate power reduction of up to 85% with minimal overhead in area over area-optimized designs operating at 5 V.

international conference on computer aided design | 2003

IDAP: A Tool for High Level Power Estimation of Custom Array Structures

Mahesh Mamidipaka; Kamal S. Khouri; Nikil D. Dutt; Magdy S. Abadir

While array structures are a significant source of power dissipation, there is a lack of accurate high-level power estimators that account for varying array circuit implementation styles. We present a methodology and a tool, the implementation-dependent array power (IDAP) estimator, that model power dissipation in SRAM-based arrays accurately based on a high-level description of the array. The models are parameterized by the array operations and various technology dependent parameters. The methodology is generic and the IDAP tool has been validated on industrial designs across a wide variety of array implementations in the e500 processor core (e500 is the Motorola processor core that is compliant with the PowerPC Book E architecture). For these industrial designs, IDAP generates high-level estimates for dynamic power dissipation that are accurate with an error margin of less than 22.2% of detailed (layout extracted) SPICE simulations. We apply the tool in three different scenarios: 1) identifying the subblocks that contribute to power significantly; 2) evaluating the effect of bitline-voltage swing on array power; and 3) evaluating the effect of memory bit-cell dimensions on array power.

international symposium on quality electronic design | 2008

Thermal Aware Global Routing of VLSI Chips for Enhanced Reliability

Aseem Gupta; Nikil D. Dutt; Fadi J. Kurdahi; Kamal S. Khouri; Magdy S. Abadir

In this paper we propose thermal aware global routing of interconnects which reduces the probability of failure of chips due to interconnect failures. Temperature has a very serious effect on the mean time to failure (MTF) of interconnects because of electromigration. We present TAGORE, a thermal aware global router. TAGORE achieves a reduction in the probability of failure by routing more wires in the colder regions of the chip and less wires in the hotter regions of the chip. We observed that TAGORE reduced the number of wires in the hottest region of a chip by up to 19.95 % and by an average of 12.29 %. This resulted in a decrease in the failure rate by up to 292 failures per million hours of operation. We also perform an analytical examination of the reduction in the probability of interconnect failure and the failure rate. The analysis shows that there is a reduction in the probability of failure of a chip if fewer wires are routed in the hot regions. This approach to reliability improvement does not require any addition of redundant wires or vias.

international conference on computer aided design | 1999

Memory binding for performance optimization of control-flow intensive behaviors

Kamal S. Khouri; Ganesh Lakshminarayana; Niraj K. Jha

The paper presents a memory binding algorithm for behaviors that are characterized by the presence of conditionals and deeply-nested loops that access memory extensively through arrays. Unlike previous works, this algorithm examines the effects of branch probabilities and allocation constraints. First, we demonstrate through examples, the importance of incorporating branch probabilities and allocation constraint information when searching for a performance-efficient memory binding. We also show the interdependence of these two factors and how varying one without considering the other may greatly affect the performance of the behavior. Second, we introduce a memory binding algorithm that has the ability to examine numerous bindings by employing an efficient performance estimation procedure. The estimation procedure exploits locality of execution, which is an inherent characteristic of target behaviors. This enables the performance estimation technique to look at the global impact of the different bindings, given the allocation constraints. We tested our algorithm using a number of benchmarks from the parallel computing domain. A series of experiments demonstrates the algorithms ability to produce bindings that optimize performance, meet memory allocation constraints, and adapt to different resource constraints and branch probabilities. Results show that the algorithm requires 37% fewer memories with a performance loss of only 0.3% when compared to a parallel memory architecture. When compared to the best of a series of random memory bindings, the algorithm improves schedule performance by 21%.

Explore More