Brian R. Prasky | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Brian R. Prasky is active.

Explore More

Publication

Featured researches published by Brian R. Prasky.

Ibm Journal of Research and Development | 2009

Design and microarchitecture of the IBM system z10 microprocessor

Chung-Lung Kevin Shum; Fadi Y. Busaba; S. Dao-Trong; Guenter Gerwig; Christian Jacobi; Thomas Koehler; E. Pfeffer; Brian R. Prasky; J. G. Rell; Aaron Tsai

The IBM System z10™ microprocessor is currently the fastest running 64-bit CISC (complex instruction set computer) microprocessor. This microprocessor operates at 4.4 GHz and provides up to two times performance improvement compared with its predecessor, the System z9® microprocessor. In addition to its ultrahigh-frequency pipeline, the z10™ microprocessor offers such performance enhancements as a sophisticated branch-prediction structure, a large second-level private cache, a data-prefetch engine, and a hardwired decimal floating-point arithmetic unit. The z10 microprocessor also implements new architectural features that allow better software optimization across compiled applications. These features include new instructions that help shorten the code path lengths and new facilities for software-directed cache management and the use of 1-MB virtual pages. The innovative microarchitecture of the z10 microprocessor and notable differences from its predecessors and the IBM POWER6™ microprocessor are discussed.

Ibm Journal of Research and Development | 2012

IBM zEnterprise 196 microprocessor and cache subsystem

Fadi Y. Busaba; Michael A. Blake; Brian W. Curran; Michael Fee; Christian Jacobi; Pak-Kin Mak; Brian R. Prasky; Craig R. Walters

The IBM zEnterprise® 196 (z196) system, announced in the second quarter of 2010, is the latest generation of the IBM System z® mainframe. The system is designed with a new microprocessor and memory subsystems, which distinguishes it from its z10® predecessor. The system has up to 40% improvement in performance for traditional z/OS® workloads and carries up to 60% more capacity when compared with its z10 predecessor. The memory subsystem has four levels of cache hierarchy (L1 through L4) and constructs the L3 and L4 caches with embedded DRAM silicon technology, which achieves approximately three times the cache density over traditional static RAM technology. The microprocessor has 50% more decode and dispatch bandwidth when compared with the z10 microprocessor, as well as an out-of-order design that can issue and execute up to five instructions every single cycle. The microprocessor has an advanced branch prediction structure and employs enhanced store queue management algorithms. At the date of product announcement, the microprocessor was the fastest complex-instruction-set computing processor in the industry, running at a sustained 5.2 GHz, executing approximately 1,100 instructions, 220 of which are cracked into reduced-instruction-set computing-type operations, to achieve large performance gains in legacy online transaction processing and compute-intensive workloads.

high-performance computer architecture | 2013

Two level bulk preload branch prediction

James J. Bonanno; Adam B. Collura; Daniel Lipetz; Ulrich Mayer; Brian R. Prasky; Anthony Saporito

This paper describes the large capacity hierarchical branch predictor in the 5.5 GHz IBM zEnterprise EC12 microprocessor. Performance analyses in a simulation model and on zEC12 hardware demonstrate the benefit of this hierarchy compared to a smaller one level predictor. Novel structures and algorithms for two level branch prediction are presented. Prediction information about multiple branches is bulk transferred from the second level into the first upon detecting a perceived miss in the first level. The second level does not directly make branch predictions. Access to the second level is limited when it is unlikely to be productive. The second level is systematically searched in an order that is likely to provide hits as early as possible. On the workloads analyzed in the simulation model, measurements show a maximum core performance benefit of 13.8%. On the two workloads analyzed on zEC12 hardware 3.4% and 5.3% system performance improvements are achieved.

Archive | 2002