Deniz Balkan | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Deniz Balkan is active.

Explore More

Publication

Featured researches published by Deniz Balkan.

international conference on computer design | 2004

Increasing processor performance through early register release

Oguz Ergin; Deniz Balkan; Dmitry Ponomarev; Kanad Ghose

Modern superscalar microprocessors need sizable register files to support large number of in-flight instructions for exploiting ILP. An alternative to building large register files is to use smaller number of registers, but manage them more effectively. More efficient management of registers can also result in higher performance if the reduction of the register file size is not the goal. Traditional register file management mechanisms deallocate a physical register only when the next instruction with the same destination architectural register commits. We propose two complementary techniques for deallocating the register immediately after the instruction producing the registers value commits itself, without waiting for the commitment of the next instruction with the same destination. Our design relies on the use of a checkpointed register file (CRF), where a local shadow copy of each bitcell is used to temporarily save the early deallocated register values should they be needed to recover from branch mispredictions or to reconstruct the precise state after exceptions or interrupts. The proposed techniques outperform the previously proposed schemes for early deallocation of registers. For the register-constrained datapath configurations, our techniques result in up to 35% performance increase with 23.3% increase on the average across SPEC2000 benchmarks.

international conference on parallel architectures and compilation techniques | 2006

Adaptive reorder buffers for SMT processors

Joseph J. Sharkey; Deniz Balkan; Dmitry Ponomarev

In SMT processors, the complex interplay between private and shared datapath resources needs to be considered in order to realize the full performance potential. In this paper, we show that blindly increasing the size of the per-thread reorder buffers to provide a larger number of in-flight instructions does not result in the expected performance gains but, quite in contrast, degrades the instruction throughput for virtually all multithreaded workloads. The reason for this performance loss is the excessive pressure on the shared datapath resources, especially the instruction scheduling logic. We propose intelligent mechanisms for dynamically adapting the number of reorder buffer entries allocated to each thread in an effort to avoid such allocations if they detrimentally impact the scheduler. We achieve this goal through categorizing the program execution into issue-bound and commit-bound phases and only performing the buffer allocations to the threads operating in commit-bound phases. Our adaptive technique achieves improvements of 21% in instruction throughput and 10% in the fairness metric compared to the best performing baseline configuration with static ROBs.

IEEE Transactions on Computers | 2006

Early Register Deallocation Mechanisms Using Checkpointed Register Files

Oguz Ergin; Deniz Balkan; Dmitry Ponomarev; Kanad Ghose

Modern superscalar microprocessors need sizable register files to support a large number of in-flight instructions for exploiting instruction level parallelism (ILP). An alternative to building large register files is to use a smaller number of registers, but manage them more effectively. More efficient management of registers can also result in higher performance if the reduction of the register file size is not the goal. Traditional register file management mechanisms deallocate a physical register only when the next instruction writing to the same destination architectural register commits. In this paper, we propose several techniques for deallocating physical registers much earlier. Our designs rely on the use of a checkpointed register file (CRF), where a local shadow copy of each bitcell is used to temporarily save the values of the early deallocated registers should they be needed to recover from branch mispredictions or to reconstruct the precise state after exceptions or interrupts. The proposed techniques try to release registers as soon as possible and are more aggressive than the previously proposed schemes for early deallocation of registers

IEEE Transactions on Very Large Scale Integration Systems | 2008

Selective Writeback: Reducing Register File Pressure and Energy Consumption

Deniz Balkan; Joseph J. Sharkey; Dmitry Ponomarev; Kanad Ghose

Much of the complexity in todays superscalar microprocessors stems from the need to maintain the speculatively produced results within the on-chip storage components until these results can be safely discarded without endangering the reconstruction of the precise state or impeding the recovery from possible branch misspeculations. For this, modern designs use large, heavily-ported physical register files (RFs) to increase the instruction throughput. The high complexity and power dissipation of such RFs mainly stem from the need to maintain each and every result for a large number of cycles after the result generation. We observed that a significant fraction (about 45%) of the result values are delivered to their consumers via the bypass network (consumed ldquoon-the-flyrdquo) and are never read out from the destination registers. In this paper, we first formulate conditions for identifying such transient values and describe their microarchitectural implementation; then we propose a technique to avoid the writeback of such transient values into the RF. With 64-entry integer and floating point register files, our technique achieves an 11% performance improvement and 29% reduction in the RF energy consumption compared to the baseline machine with the same number of registers. Furthermore, for the same performance target, the selective writeback scheme results in a 38% reduction in the energy consumption of the RF compared to the baseline machine.

international conference on parallel processing | 2006

Address-Value Decoupling for Early Register Deallocation

Deniz Balkan; Joseph J. Sharkey; Dmitry Ponomarev; Aneesh Aggarwal

We propose a series of aggressive register deallocation mechanisms to reduce the register file pressure and increase the parallelism exploited by superscalar microprocessors. Our techniques are based on a key observation that a register value can be temporarily decoupled from the register identifier. Specifically, even if a physical register is deallocated, the value is still available in the register and can be read by the dependent instructions until the register is overwritten. In these situations, we can effectively overlap the consumption of the produced register value and partial processing of the instruction that gets the same register reassigned to it. In this paper, we propose several realizations of the address-value decoupling idea and discuss their implications on the performance. Our most aggressive scheme achieves an average IPC speedup of 14.6% across simulated SPEC 2000 benchmarks

international conference on parallel architectures and compilation techniques | 2006

SPARTAN: speculative avoidance of register allocations to transient values for performance and energy efficiency

Deniz Balkan; Joseph J. Sharkey; Dmitry Ponomarev; Kanad Ghose

High-performance microprocessors use large, heavily-ported physical register files (RFs) to increase the instruction throughput. The high complexity and power dissipation of such RFs mainly stem from the need to maintain each and every result for a large number of cycles after the result generation. We observed that a significant fraction (about 45%) of the result values are never read from the register file and are not required to recover from branch mispredictions. In this paper, we propose SPARTAN — a set of micro-architectural extensions that predicts such transient values and in many cases completely avoids physical register allocations to them. We show that the transient values can be predicted as such with more than 97% accuracy on the average across simulated SPEC 2000 benchmarks. We evaluate the performance of SPARTAN on a variety of configurations and show that significant improvements in performance and energy-efficiency can be realized. Furthermore, we directly compare SPARTAN against a number of previously proposed schemes for register optimizations and show that our technique significantly outperforms all those schemes.

IEEE Transactions on Computers | 2008

Predicting and Exploiting Transient Values for Reducing Register File Pressure and Energy Consumption

Deniz Balkan; Joseph J. Sharkey; Dmitry Ponomarev; Kanad Ghose

High-performance microprocessors use large, heavily ported physical register files (RFs) to increase the instruction throughput. The high complexity and power dissipation of such RFs mainly stem from the need to maintain each and every result for a large number of cycles after the result generation. We observed that a significant fraction (about 45 percent) of the result values are never read from the register file and are not required to reconstruct the precise state following branch mispredictions. In this paper, we propose Speculative Avoidance of Register allocations to Transient values (SPARTAN) - a set of microarchitectural extensions that predicts such transient values and, in many cases, completely avoids physical register allocations to them. We show that the transient values can be predicted as such with more than 97 percent accuracy, on average, across simulated SPEC 2000 benchmarks. We evaluate the performance of SPARTAN on a variety of configurations and show that significant improvements in performance and energy efficiency can be realized. Furthermore, we directly compare SPARTAN against a number of previously proposed schemes for register optimizations and show that our technique significantly outperforms all those schemes.

symposium on computer architecture and high performance computing | 2004

A study of errant pipeline flushes caused by value misspeculation

Deniz Balkan; John Kalamatianos; David R. Kaeli

Value speculation has been proposed as a technique that can overcome true data dependencies, hide memory latencies, and expose higher degrees of instruction level parallelism (ILP). Branch direction prediction and target address prediction are two widely used control speculation techniques aimed at providing a steady stream of instructions to the instruction window. In this paper we consider a load value predictor used together with an aggressive branch predictor microarchitecture and investigate the effects of load value misspeculations on branch resolution. We study the performance impact of the interaction of these mechanisms and characterize the occurence of these events in a multiple issue, out-of-order, superscalar pipeline. We perform execution-driven studies using integer benchmarks taken from the SPECint2000, SPECint95 and Olden suites. We show that IPC can deteriorate by as much as 4.7% due to unnecessary pipeline flushes caused by branch resolutions that use speculative data. This paper also proposes a mechanism that can prevent these unnecessary squashes from occurring.

Archive | 2010