Michael Bekerman | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Michael Bekerman is active.

Explore More

Publication

Featured researches published by Michael Bekerman.

international symposium on microarchitecture | 1998

A novel renaming scheme to exploit value temporal locality through physical register reuse and unification

Stephen J. Jourdan; Ronny Ronen; Michael Bekerman; Bishara Shomar; Adi Yoaz

Hardware renaming schemes provide multiple physical locations (register or memory) for each logical name. In current renaming schemes, a new physical location is allocated for each dispatched instruction regardless of its result value. However, these values exhibit a high level of temporal locality (result redundancy). This paper proposes: Physical Register Reuse. To reuse a physical location whenever it is detected that an incoming result value matches a previous one. This is performed during register renaming and requires some VALUE-IDENTITY DETECTION hardware. By mapping several logical registers holding the same value to the same physical register, Physical Register Reuse gives the opportunities: SHARING-exploit the high level of value-redundancy in the register file to either reduce the file size and complexity, or effectively enlarge the active instruction window. Our results suggest reduction factors of 2 to 4 in some cases. Performance is increased either by the enlarged instruction window or by the higher frequency enabled by a smaller register file requiring fewer ports. RESULT REUSE AND DEPENDENCY REDIRECTION-move the responsibility of generating results: (1) From the functional units to the register renamer, resulting in the possible elimination of processed instructions from the execution stream. (2) From one instruction to an earlier instruction stream, possibly allowing instructions to be scheduled earlier. This way, large performance speedups are achieved. 2. Unification. To combine the memory renamer with the register renamer in order to extend the above-stated sharing and result reuse and dependency redirection ideas to both registers and memory locations. This allows even greater hardware savings and performance improvements. This also simplifies the processing of store instructions.

international symposium on computer architecture | 1999

Correlated load-address predictors

Michael Bekerman; Stephan J. Jourdan; Ronny Ronen; Gilad Kirshenboim; Lihu Rappoport; Adi Yoaz; Uri C. Weiser

As microprocessors become faster, the relative performance cost of memory accesses increases. Bigger and faster caches significantly reduce the absolute load-to-use time delay. However, increase in processor operational frequencies impairs the relative load-to-use latency, measured in processor cycles (e.g. from two cycles on the Pentium® processor to three cycles or more in current designs). Load-address prediction techniques were introduced to partially cut the load-to-use latency. This paper focuses on advanced address-prediction schemes to further shorten program execution time.Existing address prediction schemes are capable of predicting simple address patterns, consisting mainly of constant addresses or stride-based addresses. This paper explores the characteristics of the remaining loads and suggests new enhanced techniques to improve prediction effectiveness:• Context-based prediction to tackle part of the remaining, difficult-to-predict, load instructions.• New prediction algorithms to take advantage of global correlation among different static loads.• New confidence mechanisms to increase the correct prediction rate and to eliminate costly mispredictions.• Mechanisms to prevent long or random address sequences from polluting the predictor data structures while providing some hysteresis behavior to the predictions.Such an enhanced address predictor accurately predicts 67% of all loads, while keeping the misprediction rate close to 1%. We further prove that the proposed predictor works reasonably well in a deep pipelined architecture where the predict-to-update delay may significantly impair both prediction rate and accuracy.

international symposium on computer architecture | 2000

Early load address resolution via register tracking

Michael Bekerman; Adi Yoaz; Freddy Gabbay; Stephan J. Jourdan; Maxim Kalaev; Ronny Ronen

Higher microprocessor frequencies accentuate the performance cost of memory accesses. This is especially noticeable in the Intels IA32 architecture where lack of registers results in increased number of memory accesses. This paper presents novel, non-speculative technique that partially hides the increasing load-to-use latency, by allowing the early issue of load instructions. Early load address resolution relies on register tracking to safely compute the addresses of memory references in the front-end part of the processor pipeline. Register tracking enables decode-time computation of register values by tracking simple operations of the form reg±immediate. Register tracking may be performed in any pipeline stage following instruction decode and prior to execution. Several tracking schemes are proposed in this paper:Stack pointer tracking allows safe early resolution of stack references by keeping track of the value of the ESP register (the stack pointer). About 25% of all loads are stack loads and 95% of these loads may be resolved in the front-end. Absolute address tracking allows the early resolution of constant-address loads. Displacement-based tracking tackles all loads with addresses of the form reg±immediate by tracking the values of all general-purpose registers. This class corresponds to 82% of all loads, and about 65% of these loads can be safely resolved in the front-end pipeline. The paper describes the tracking schemes, analyzes their performance potential in a deeply pipelined processor and discusses the integration of tracking with memory disambiguation.

Archive | 1999