Edward A. Brekelbaum
Intel
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Edward A. Brekelbaum.
international symposium on microarchitecture | 2002
Edward A. Brekelbaum; Jeff Rupley; Chris Wilkerson; Bryan Black
Large scheduling windows are an effective mechanism for increasing microprocessor performance through the extraction of instruction level parallelism. Current techniques do not scale effectively for very large windows, leading to slow wakeup and select logic as well as large complicated bypass networks. This paper introduces a new instruction scheduler implementation, referred to as Hierarchical Scheduling Windows or HSW, which exploits latency tolerant instructions in order to reduce implementation complexity. HSW yields a very large instruction window that tolerates wakeup, select, and bypass latency, while extracting significant far flung ILP. Results: It is shown that HSW loses <0.5% performance per additional cycle of bypass/select/wakeup latency as compared to a monolithic window that loses /spl sim/5% per additional cycle. Also, HSW achieves the performance of traditional implementations with only 1/3 to 1/2 the number of entries in the critical timing path.
international symposium on computer architecture | 2007
Peter G. Sassone; Jeff Rupley; Edward A. Brekelbaum; Gabriel H. Loh; Bryan Black
From multiprocessor scale-up to cache sizes to the number of reorder-buffer entries, microarchitects wish to reap the benefits of more computing resources while staying within power and latency bounds. This tension is quite evident in schedulers, which need to be large and single-cycle for maximum performance on out-of-order cores. In this work we present two straightforward modifications to a matrix scheduler implementation which greatly strengthen its scalability. Both are based on the simple observation that the wakeup and picker matrices are sparse, even at small sizes; thus small indirection tables can be used to greatly reduce their width and latency. This technique can be used to create quicker iso-performance schedulers (17-58% reduced critical path) or larger iso-timing schedulers (7-26% IPC increase). Importantly, the power and area requirements of the additional hardware are likely offset by the greatly reduced matrix sizes and subsuming the functionality of the power-hungry allocation CAMs.
Archive | 2003
Jeffrey P. Rupley Ii; Edward A. Brekelbaum; Edward T. Grochowski; Bryan Black
Archive | 2002
Edward A. Brekelbaum; Jeffrey P. Rupley Ii
Archive | 2002
Edward A. Brekelbaum; Bryan Black; P. Rupley Ii Jeffrey
Archive | 2004
Jeffrey P. Rupley Ii; Edward A. Brekelbaum; Bryan Black
Archive | 2002
Bohuslav Rychlik; Ryan Rakvic; Edward A. Brekelbaum; Bryan Black
Archive | 2004
John P. Devale; Bryan Black; Edward A. Brekelbaum; Jeffrey P. Rupley Ii
Archive | 2003
Jeffrey P. Rupley Ii; Edward A. Brekelbaum; Edward T. Grochowski
Archive | 2010
Mohammed H. Taufique; Derwin Jallice; Donald W. McCauley; John P. Devale; Edward A. Brekelbaum; P. Rupley Ii Jeffrey; Gabriel H. Loh; Bryan Black