Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Oguz Ergin is active.

Publication


Featured researches published by Oguz Ergin.


IEEE Micro | 2006

Impact of Parameter Variations on Circuits and Microarchitecture

Osman S. Unsal; James W. Tschanz; Keith A. Bowman; Vivek De; Xavier Vera; Antonio González; Oguz Ergin

Parameter variations, which are increasing along with advances in process technologies, affect both timing and power. Variability must be considered at both the circuit and microarchitectural design levels to keep pace with performance scaling and to keep power consumption within reasonable limits. This article presents an overview of the main sources of variability and surveys variation-tolerant circuit and microarchitectural approaches


international conference on computer design | 2004

Increasing processor performance through early register release

Oguz Ergin; Deniz Balkan; Dmitry Ponomarev; Kanad Ghose

Modern superscalar microprocessors need sizable register files to support large number of in-flight instructions for exploiting ILP. An alternative to building large register files is to use smaller number of registers, but manage them more effectively. More efficient management of registers can also result in higher performance if the reduction of the register file size is not the goal. Traditional register file management mechanisms deallocate a physical register only when the next instruction with the same destination architectural register commits. We propose two complementary techniques for deallocating the register immediately after the instruction producing the registers value commits itself, without waiting for the commitment of the next instruction with the same destination. Our design relies on the use of a checkpointed register file (CRF), where a local shadow copy of each bitcell is used to temporarily save the early deallocated register values should they be needed to recover from branch mispredictions or to reconstruct the precise state after exceptions or interrupts. The proposed techniques outperform the previously proposed schemes for early deallocation of registers. For the register-constrained datapath configurations, our techniques result in up to 35% performance increase with 23.3% increase on the average across SPEC2000 benchmarks.


high-performance computer architecture | 2016

ChargeCache: Reducing DRAM latency by exploiting row access locality

Hasan Hassan; Gennady Pekhimenko; Nandita Vijaykumar; Vivek Seshadri; Donghyuk Lee; Oguz Ergin; Onur Mutlu

DRAM latency continues to be a critical bottleneck for system performance. In this work, we develop a low-cost mechanism, called ChargeCache, that enables faster access to recently-accessed rows in DRAM, with no modifications to DRAM chips. Our mechanism is based on the key observation that a recently-accessed row has more charge and thus the following access to the same row can be performed faster. To exploit this observation, we propose to track the addresses of recently-accessed rows in a table in the memory controller. If a later DRAM request hits in that table, the memory controller uses lower timing parameters, leading to reduced DRAM latency. Row addresses are removed from the table after a specified duration to ensure rows that have leaked too much charge are not accessed with lower latency. We evaluate ChargeCache on a wide variety of workloads and show that it provides significant performance and energy benefits for both single-core and multi-core systems.


IEEE Transactions on Very Large Scale Integration Systems | 2003

Energy-efficient issue queue design

Dmitry Ponomarev; Gurhan Kucuk; Oguz Ergin; Kanad Ghose; Peter M. Kogge

The out-of-order issue queue (IQ), used in modern superscalar processors is a considerable source of energy dissipation. We consider design alternatives that result in significant reductions in the power dissipation of the IQ (by as much as 75%) through the use of comparators that dissipate energy mainly on a tag match, 0-B encoding of operands to imply the presence of bytes with all zeros and, bitline segmentation. Our results are validated by the execution of SPEC 95 benchmarks on a true hardware level, cycle-by-cycle simulator for a superscalar processor and SPICE measurements for actual layouts of the IQ in a 0.18-/spl mu/m CMOS process.


PACS'04 Proceedings of the 4th international conference on Power-Aware Computer Systems | 2004

Reducing delay and power consumption of the wakeup logic through instruction packing and tag memoization

Joseph J. Sharkey; Dmitry Ponomarev; Kanad Ghose; Oguz Ergin

Dynamic instruction scheduling logic is one of the most critical components of modern superscalar microprocessors, both from the delay and power dissipation standpoints. The delay and energy requirement of driving the result tags across the associatively-addressed issue queue accounts for a significant percentage of the schedulers overhead and also limits the design scalability. We propose two schemes to reduce the power consumption and the delays of the wakeup logic. Our first scheme – instruction packing – shares the associative part of an issue queue entry between two instructions, each with at most one non-ready source. As a result, the number of entries in the issue queue (and, hence, the length of the tag buses) can be reduced by a factor of two with almost no impact on the IPCs, because most instructions either enter the pipeline with at least one of their source operands ready, or do not make use of two source registers to begin with. Our second scheme – tag memoization – avoids driving the upper portion of the tags, if those bits did not change their values from what was driven on the same tag bus during the most recent broadcast. While instruction packing results in the reduced length of the tag buses, tag memoization reduced the number of tag lines that need to be driven. We evaluate our designs using detailed microarchitectural simulations of the SPEC 2000 benchmarks and the SPICE simulations of the issue queue layouts.


IEEE Transactions on Computers | 2004

Energy efficient comparators for superscalar datapaths

Dmitry Ponomarev; Gurhan Kucuk; Oguz Ergin; Kanad Ghose

Modern superscalar datapaths use aggressive execution reordering to exploit instruction-level parallelism. Comparators, either explicit or embedded into content-addressable logic, are used extensively throughout such designs to implement several key out-of-order execution mechanisms and support the memory hierarchy. The traditional comparator designs dissipate energy on a mismatch in any bit position. As mismatches occur with a much higher frequency than matches in many situations, considerable improvements in energy dissipation are to be gained by using comparators that dissipate energy predominantly on a full match and little or no energy on partial or complete mismatches. We make two contributions. First, we introduce a series of dissipate-on-match comparator designs, including designs for comparing long arguments. Second, we show how comparators, used in modern datapaths, can be chosen and organized judiciously based on the microarchitectural-level statistics to minimize the energy dissipation. We use the actual layout data and the realistic bit patterns of the comparands (obtained from the simulated execution of SPEC 2000 benchmarks) to show the energy impact from the use of the new comparator designs. For the same delay, the proposed 8-bit comparators dissipate 70 percent less energy than the traditional designs if used within issue queues and 73 percent less energy if used within load-store queues. The use of the proposed 6-bit comparators within the dependency checking logic is shown to increase the energy dissipation by 65 percent on the average compared to the traditional designs. We also find that the use of a hybrid 32-bit comparator, comprised of three traditional 8-bit blocks and one proposed 8-bit block, is the most energy-efficient solution for the use in the load-store queue, resulting in 19 percent energy reduction compared to the use of four traditional 8-bit blocks used to implement a 32-bit comparator.


IEEE Computer Architecture Letters | 2006

Exploiting Narrow Values for Soft Error Tolerance

Oguz Ergin; Osman S. Unsal; Xavier Vera; Antonio González

Soft errors are an important challenge in contemporary microprocessors. Particle hits on the components of a processor are expected to create an increasing number of transient errors with each new microprocessor generation. In this paper we propose simple mechanisms that effectively reduce the vulnerability to soft errors In a processor. Our designs are generally motivated by the fact that many of the produced and consumed values in the processors are narrow and their upper order bits are meaningless. Soft errors canted by any particle strike to these higher order bits can be avoided by simply identifying these narrow values. Alternatively soft errors can be detected or corrected on the narrow values by replicating the vulnerable portion of the value inside the storage space provided for the upper order bits of these operands. We offer a variety of schemes that make use of narrow values and analyze their efficiency in reducing soft error vulnerability of level-1 data cache of the processor


IEEE Transactions on Computers | 2004

Isolating short-lived operands for energy reduction

Dmitry Ponomarev; Gurhan Kucuk; Oguz Ergin; Kanad Ghose

A mechanism for reducing the power requirements in processors that use a separate (architectural) register file (ARF) for holding committed values is proposed. We exploit the notion of short-lived operands-values that target architectural registers that are renamed by the time the instruction producing the value reaches the writeback stage. Our simulations of the SPEC 2000 benchmarks show that as much as 71 percent to 97 percent of the results are short-lived. Our technique avoids unnecessary writebacks into the result repository (a slot within the reorder buffer or a physical register) as well as writes into the ARF from unnecessary commitments by caching (and isolating) short-lived operands within a small dedicated register file. Operands are cached in this manner till they can be safely discarded without jeopardizing the recovery from possible branch mispredictions or reconstruction of the precise state in case of interrupts or exceptions. Additional energy savings are achieved by limiting the number of ports used for instruction commitment. The power/energy savings are validated using SPICE measurements of actual layouts in a 0.18 micron CMOS process. The energy reduction in the ROB and the ARF is about 20 percent (translating into the overall chip energy reduction of about 5 percent) and this is achieved with no increase in cycle time, little additional complexity, and no degradation in the number of instructions committed per cycle.


international conference on parallel architectures and compilation techniques | 2005

Compiler directed early register release

Timothy M. Jones; Michael F. P. O'Boyle; Jaume Abella; Antonio González; Oguz Ergin

This paper presents a novel compiler directed technique to reduce the register pressure and power of the register file by releasing registers early. The compiler identifies registers that mil only be read once and renames them to different logical registers. Upon issuing an instruction with one of these logical registers as a source, the processor knows that there will be no more uses of it and can release the register through checkpointing. This reduces the occupancy of our banked register file, allowing banks to be turned off for power savings. Our scheme is faster, simpler and requires less hardware than recently proposed techniques. It also maintains precise interrupts and exceptions where many other techniques do not. We reduce register occupancy by 28% in a large register file and gain in performance too; this translates into dynamic and static power saving of 18%. When compared to state-of-the-art approaches for varying register file sizes, our scheme is always faster (higher IPC) and always achieves a greater reduction in register file occupancy.


international conference on parallel architectures and compilation techniques | 2003

Reducing datapath energy through the isolation of short-lived operands

Dmitry Ponomarev; Gurhan Kucuk; Oguz Ergin; Kanad Ghose

We present a technique for reducing the power dissipation in the course of writebacks and commitments in a datapath that uses a dedicated architectural register file (ARF) to hold committed values. Our mechanism capitalizes on the observation that most of the produced register values are short-lived, meaning that the destination registers targeted by these values are renamed by the time the results are written back. Our technique avoids unnecessary writebacks into the result repository (a slot within the reorder buffer or a physical register) as well as writes into the ARF by caching (and isolating) short-lived operands within a small dedicated register file. Operands are cached in this manner till they can be safely discarded without jeopardizing the recovery from possible branch mispredictions or reconstruction of the precise state in case of interrupts or exceptions. The power/energy savings are validated using SPICE measurements of actual layouts in a 0.18 micron CMOS process. The energy reduction in the ROB and the ARF is in the range of 20-25% and this is achieved with no increase in the cycle time, little additional complexity and no IPC drop.

Collaboration


Dive into the Oguz Ergin's collaboration.

Top Co-Authors

Avatar

Jaume Abella

Barcelona Supercomputing Center

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Antonio González

Polytechnic University of Catalonia

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Osman S. Unsal

Barcelona Supercomputing Center

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge