Edward A. Brekelbaum | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Edward A. Brekelbaum is active.

Explore More

Publication

Featured researches published by Edward A. Brekelbaum.

international symposium on microarchitecture | 2002

Hierarchical scheduling windows

Edward A. Brekelbaum; Jeff Rupley; Chris Wilkerson; Bryan Black

Large scheduling windows are an effective mechanism for increasing microprocessor performance through the extraction of instruction level parallelism. Current techniques do not scale effectively for very large windows, leading to slow wakeup and select logic as well as large complicated bypass networks. This paper introduces a new instruction scheduler implementation, referred to as Hierarchical Scheduling Windows or HSW, which exploits latency tolerant instructions in order to reduce implementation complexity. HSW yields a very large instruction window that tolerates wakeup, select, and bypass latency, while extracting significant far flung ILP. Results: It is shown that HSW loses <0.5% performance per additional cycle of bypass/select/wakeup latency as compared to a monolithic window that loses /spl sim/5% per additional cycle. Also, HSW achieves the performance of traditional implementations with only 1/3 to 1/2 the number of entries in the critical timing path.

international symposium on computer architecture | 2007

Matrix scheduler reloaded

Peter G. Sassone; Jeff Rupley; Edward A. Brekelbaum; Gabriel H. Loh; Bryan Black

From multiprocessor scale-up to cache sizes to the number of reorder-buffer entries, microarchitects wish to reap the benefits of more computing resources while staying within power and latency bounds. This tension is quite evident in schedulers, which need to be large and single-cycle for maximum performance on out-of-order cores. In this work we present two straightforward modifications to a matrix scheduler implementation which greatly strengthen its scalability. Both are based on the simple observation that the wakeup and picker matrices are sparse, even at small sizes; thus small indirection tables can be used to greatly reduce their width and latency. This technique can be used to create quicker iso-performance schedulers (17-58% reduced critical path) or larger iso-timing schedulers (7-26% IPC increase). Importantly, the power and area requirements of the additional hardware are likely offset by the greatly reduced matrix sizes and subsuming the functionality of the power-hungry allocation CAMs.

Archive | 2003