Gurhan Kucuk | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Gurhan Kucuk is active.

Explore More

Publication

Featured researches published by Gurhan Kucuk.

IEEE Transactions on Very Large Scale Integration Systems | 2003

Energy-efficient issue queue design

Dmitry Ponomarev; Gurhan Kucuk; Oguz Ergin; Kanad Ghose; Peter M. Kogge

The out-of-order issue queue (IQ), used in modern superscalar processors is a considerable source of energy dissipation. We consider design alternatives that result in significant reductions in the power dissipation of the IQ (by as much as 75%) through the use of comparators that dissipate energy mainly on a tag match, 0-B encoding of operands to imply the presence of bytes with all zeros and, bitline segmentation. Our results are validated by the execution of SPEC 95 benchmarks on a true hardware level, cycle-by-cycle simulator for a superscalar processor and SPICE measurements for actual layouts of the IQ in a 0.18-/spl mu/m CMOS process.

international symposium on low power electronics and design | 2001

Energy: efficient instruction dispatch buffer design for superscalar processors

Gurhan Kucuk; Kanad Ghose; Dimitry V. Ponomarev; Peter M. Kogge

The instruction dispatch buffer (DB, also known as an issue queue) used in modem superscalar processors is a considerable source of energy dissipation. We consider design alternatives that result in significant reductions in the power dissipation of the DB (by as much as 60%) through the use of: (a) fast comparators that dissipate energy mainly on a tag match, (b) zero byte encoding of operands to imply the presence of bytes with all zeros and, (c) bitline segmentation. Our results are validated by the execution of SPEC 95 benchmarks on true hardware level, cycle-by-cycle simulator for a superscalar processor and SPICE measurements for actual layouts of the DB and its variants in a 0.5 micron CMOS process.

IEEE Transactions on Computers | 2004

Energy efficient comparators for superscalar datapaths

Dmitry Ponomarev; Gurhan Kucuk; Oguz Ergin; Kanad Ghose

Modern superscalar datapaths use aggressive execution reordering to exploit instruction-level parallelism. Comparators, either explicit or embedded into content-addressable logic, are used extensively throughout such designs to implement several key out-of-order execution mechanisms and support the memory hierarchy. The traditional comparator designs dissipate energy on a mismatch in any bit position. As mismatches occur with a much higher frequency than matches in many situations, considerable improvements in energy dissipation are to be gained by using comparators that dissipate energy predominantly on a full match and little or no energy on partial or complete mismatches. We make two contributions. First, we introduce a series of dissipate-on-match comparator designs, including designs for comparing long arguments. Second, we show how comparators, used in modern datapaths, can be chosen and organized judiciously based on the microarchitectural-level statistics to minimize the energy dissipation. We use the actual layout data and the realistic bit patterns of the comparands (obtained from the simulated execution of SPEC 2000 benchmarks) to show the energy impact from the use of the new comparator designs. For the same delay, the proposed 8-bit comparators dissipate 70 percent less energy than the traditional designs if used within issue queues and 73 percent less energy if used within load-store queues. The use of the proposed 6-bit comparators within the dependency checking logic is shown to increase the energy dissipation by 65 percent on the average compared to the traditional designs. We also find that the use of a hybrid 32-bit comparator, comprised of three traditional 8-bit blocks and one proposed 8-bit block, is the most energy-efficient solution for the use in the load-store queue, resulting in 19 percent energy reduction compared to the use of four traditional 8-bit blocks used to implement a 32-bit comparator.

international conference on supercomputing | 2002

Low-complexity reorder buffer architecture

Gurhan Kucuk; Dmitry Ponomarev; Kanad Ghose

In some of todays superscalar processors (e.g.the Pentium III), the result repositories are implemented as the Reorder Buffer (ROB) slots. In such designs, the ROB is a complex multi-ported structure that occupies a significant portion of the die area and dissipates a non-trivial fraction of the total chip power, as much as 27% according to some estimates. In addition, an access to such ROB typically takes more than one cycle, impacting the IPC adversely.We propose a low-complexity and low-power ROB design that exploits the fact that the bulk of the source operand values is obtained through data forwarding to the issue queue or through direct reads of the committed register values. Our ROB design uses an organization that completely eliminates the read ports needed to read out operand values for instruction issue. Any consequential performance degradation is countered by using a small number of associatively-addressed retention latches to hold the most recent set of values written into the ROB. The contents of the retention latches are used to satisfy the operand reads for issue that would otherwise have to be read from the ROB slots. Significant savings of the ROB real estate as well as power savings in the range of 20% to 30% for the ROB are achieved using the proposed technique. At the same time, the fact that results are accessible in a single cycle from the retention latches actually leads to an overall improvement in the IPC of up to 3% on the average for SPEC 2000 benchmarks.

design, automation, and test in europe | 2002

AccuPower: An Accurate Power Estimation Tool for Superscalar Microprocessors

Dmitry Ponomarev; Gurhan Kucuk; Kanad Ghose

This paper describes the AccuPower toolset-a set of simulation tools accurately estimating the power dissipation within a superscalar microprocessor. AccuPower uses a true hardware level and cycle level microarchitectural simulator and energy dissipation coefficients gleaned from SPICE measurements of actual CMOS layouts of critical datapath components. Transition counts can be obtained at the level of bits within data and instruction streams, at the level of registers, or at the level of larger building blocks (such as caches, issue queue, reorder buffer function units). This allows for an accurate estimation of switching activity at any desired level of resolution. The toolsuite implements several variants of superscalar datapath designs in use today and permits the exploration of design choices at the microarchitecture level as well as the circuit level, including the use of voltage and frequency scaling. In particular the AccuPower toolsuite includes detailed implementations of currently used and proposed techniques for energy/power conservations including techniques for data encoding and compression, alternative circuit approaches, dynamic resource allocation and datapath reconfiguration. The microarchitectural simulation components of AccuPower can be used for accurate evaluation of datapath designs in a manner well beyond the scope of the widely-used Simplescalar tools.

IEEE Transactions on Computers | 2004

Isolating short-lived operands for energy reduction

Dmitry Ponomarev; Gurhan Kucuk; Oguz Ergin; Kanad Ghose

A mechanism for reducing the power requirements in processors that use a separate (architectural) register file (ARF) for holding committed values is proposed. We exploit the notion of short-lived operands-values that target architectural registers that are renamed by the time the instruction producing the value reaches the writeback stage. Our simulations of the SPEC 2000 benchmarks show that as much as 71 percent to 97 percent of the results are short-lived. Our technique avoids unnecessary writebacks into the result repository (a slot within the reorder buffer or a physical register) as well as writes into the ARF from unnecessary commitments by caching (and isolating) short-lived operands within a small dedicated register file. Operands are cached in this manner till they can be safely discarded without jeopardizing the recovery from possible branch mispredictions or reconstruction of the precise state in case of interrupts or exceptions. Additional energy savings are achieved by limiting the number of ports used for instruction commitment. The power/energy savings are validated using SPICE measurements of actual layouts in a 0.18 micron CMOS process. The energy reduction in the ROB and the ARF is about 20 percent (translating into the overall chip energy reduction of about 5 percent) and this is achieved with no increase in cycle time, little additional complexity, and no degradation in the number of instructions committed per cycle.

international conference on parallel architectures and compilation techniques | 2003

Reducing datapath energy through the isolation of short-lived operands

Dmitry Ponomarev; Gurhan Kucuk; Oguz Ergin; Kanad Ghose

We present a technique for reducing the power dissipation in the course of writebacks and commitments in a datapath that uses a dedicated architectural register file (ARF) to hold committed values. Our mechanism capitalizes on the observation that most of the produced register values are short-lived, meaning that the destination registers targeted by these values are renamed by the time the results are written back. Our technique avoids unnecessary writebacks into the result repository (a slot within the reorder buffer or a physical register) as well as writes into the ARF by caching (and isolating) short-lived operands within a small dedicated register file. Operands are cached in this manner till they can be safely discarded without jeopardizing the recovery from possible branch mispredictions or reconstruction of the precise state in case of interrupts or exceptions. The power/energy savings are validated using SPICE measurements of actual layouts in a 0.18 micron CMOS process. The energy reduction in the ROB and the ARF is in the range of 20-25% and this is achieved with no increase in the cycle time, little additional complexity and no IPC drop.

international conference on wireless communications and mobile computing | 2009

FireSenseTB: a wireless sensor networks testbed for forest fire detection

Bilgin Kosucu; Kerem Irgan; Gurhan Kucuk; Sebnem Baydere

Wireless sensor networks promise great success in many areas from environmental monitoring to medical and military applications. Forest fire detection is one of these areas where many of the ongoing WSN research is focused today. Unfortunately, most of these studies choose simulating their proposed solutions instead of doing experiments in real testbed environments, since that kind of setup exposes additional difficulties. Our previous work, named FireSense, proposed a fire detection algorithm, which was shown to be successful in terms of simulation results. In this study, we take FireSense to a real outdoor testbed for further analysis of its effectiveness in terms of various parameters such as link and node failures, topology and physical configuration changes, wind direction, ignition point position and sampling period variations.

international conference on computer design | 2002

A circuit-level implementation of fast, energy-efficient CMOS comparators for high-performance microprocessors

Oguz Ergin; Kanad Ghose; Gurhan Kucuk; Dmitry Ponomarev

Datapath components in modem high performance superscalar processors employ a significant amount of associative addressing logic based on the use of comparators that dissipate energy on a mismatch. These comparators are used to detect a full match, but as mismatches are much more common than full matches in some components of the CPU, considerable energy-inefficiencies occur within the associative logic. We propose the design of two new comparator circuits that predominantly dissipate energy on a match, thus resulting in very significant savings in comparator power dissipation. The proposed designs are evaluated using SPICE simulations of actual VLSI layouts of the comparators in 0.18 micron 6-metal layer process and micro-architectural level statistics.

international conference on computer design | 2003

Distributed reorder buffer schemes for low power

Gurhan Kucuk; Oguz Ergin; Dmitry Ponomarev; Kanad Ghose

We consider two approaches for reducing the complexity and power dissipation in processors that use separate register file to maintain committed register values. The first approach relies on a distributed implementation of the reorder buffer (ROB) that spreads the centralized ROB structure across the function units (FUs), with each distributed component sized to match the FU workload and with one write port and two read ports on each component. The second approach combines the use of the previously proposed retention latches and a distributed ROB implementation that uses minimally-ported distributed components. Such a combination avoids any read and write port conflicts on the distributed ROB components (with the exception of possible port conflicts in the course of commitment) and does not incur the associated performance degradation. Our designs are evaluated using the simulation of the SPEC 2000 benchmarks and SPICE simulations of the actual ROB layouts in 0.18 micron process. The ROB power savings of up to 49% can be realized with only 1.7% performance loss on the average.

Explore More