Craig R. Walters | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Craig R. Walters is active.

Explore More

Publication

Featured researches published by Craig R. Walters.

Ibm Journal of Research and Development | 2009

IBM system z10 processor cache subsystem microarchitecture

Pak-Kin Mak; Craig R. Walters; Gary E. Strait

With the introduction of the high-frequency IBM System z10™ processor design, a new, robust cache hierarchy was needed to enable up to 80 of these processors aggregated into a tightly coupled symmetric multiprocessor (SMP) system to reach their performance potential. Typically, each time the processor frequency increases by a significant factor, as did the z10™ processor over the predecessor IBM System z9® processor, the access time of data, as measured by the number of processor cycles beyond the level 1 cache on an identical processor cache subsystem, would increase proportionally as well because the flight time on the chip interconnects across multiple hardware packaging levels has stayed relatively constant in nanoseconds. To address the latency scaling problem and the increased demand of the larger 80-way SMP size, the z10 processor cache subsystem introduces new innovative concepts and solutions.

Ibm Journal of Research and Development | 2012

IBM zEnterprise 196 microprocessor and cache subsystem

Fadi Y. Busaba; Michael A. Blake; Brian W. Curran; Michael Fee; Christian Jacobi; Pak-Kin Mak; Brian R. Prasky; Craig R. Walters

The IBM zEnterprise® 196 (z196) system, announced in the second quarter of 2010, is the latest generation of the IBM System z® mainframe. The system is designed with a new microprocessor and memory subsystems, which distinguishes it from its z10® predecessor. The system has up to 40% improvement in performance for traditional z/OS® workloads and carries up to 60% more capacity when compared with its z10 predecessor. The memory subsystem has four levels of cache hierarchy (L1 through L4) and constructs the L3 and L4 caches with embedded DRAM silicon technology, which achieves approximately three times the cache density over traditional static RAM technology. The microprocessor has 50% more decode and dispatch bandwidth when compared with the z10 microprocessor, as well as an out-of-order design that can issue and execute up to five instructions every single cycle. The microprocessor has an advanced branch prediction structure and employs enhanced store queue management algorithms. At the date of product announcement, the microprocessor was the fastest complex-instruction-set computing processor in the industry, running at a sustained 5.2 GHz, executing approximately 1,100 instructions, 220 of which are cracked into reduced-instruction-set computing-type operations, to achieve large performance gains in legacy online transaction processing and compute-intensive workloads.

Ibm Journal of Research and Development | 2009

IBM system z10 I/O subsystem

Edward W. Chencinski; Mark A. Check; Casimer M. DeCusatis; H. Deng; M. Grassi; Thomas A. Gregg; Markus M. Helms; A. D. Koenig; L. Mohr; Kulwant M. Pandey; Thomas Schlipf; Torsten Schober; H. Ulrich; Craig R. Walters

The performance, reliability, and functionality of a large server are greatly influenced by the design characteristics of its I/O subsystem. The critical components of the IBM System z10™ I/O subsystem have, therefore, been significantly improved in terms of performance, capability, and cost. The first-order network has been redesigned from the long-evolved enhanced self-timed interface (eSTI) links to utilize InfiniBand™ links. A redesign of the host logic of I/O chips and the fiberoptic interfaces within the links made it possible to introduce InfiniBand-based IBM Parallel Sysplex® links. A broad range of legacy I/O channels have been carried forward to connect through InfiniBand, and a foundation has been laid for new channel types of improved functionality and performance. The first such hardware channel to be introduced is the next generation of Ethernet-virtualization data routers. A new and methodical recovery structure has been designed to ensure consistent, extensive support of reliability, availability, and serviceability. A building-block-oriented design process has been developed to enable the innovations that made these advances possible. Finally, a new performance verification methodology has been introduced to ensure that the system and subsystem designs are balanced to make effective use of the increased capacity.

Ibm Journal of Research and Development | 2015

The IBM z13 processor cache subsystem

Craig R. Walters; Pak-Kin Mak; Deanna Postles Dunn Berger; Michael A. Blake; Tim Bronson; Kenneth D. Klapproth; Arthur J. O'Neill; Robert J. Sonnelitter; Vesselina K. Papazova

The IBM z13™ system introduces many new innovative concepts in building a high-performance modular and scalable symmetrical multiprocessing (SMP) system, comprising up to 192 multithreaded processors that span eight system processing nodes. The z13 uses new socket packaging technology, changing from multichip modules (MCMs) to single-chip modules (SCMs). This enables the modularity and scalability of a large distributed SMP system and led to the development of new techniques in several important performance areas. For the cache hierarchy, the inclusivity management policy is optimized between the third-level and the fourth-level shared caches to improve overall cache-bit efficiency, effectively making the fourth-level cache larger to reduce the impact of increased chip socket-to-socket access latencies. The system bus management is enhanced such that multiple data transfers can be simultaneously overlapped on an interface to reduce wait times on critical data when these buses are highly utilized. With the amount of caches on both the Central Processor (CP) and System Controller (SC) chips, several major improvements were made for array macro resiliency to improve overall system availability. These and other major design updates in the latest mainframe processor cache subsystem are described in this paper.

Ibm Journal of Research and Development | 2012

Performance innovation in the IBM zEnterprise 196 processor

Dan F. Greiner; Marcel Mitran; Timothy J. Slegel; Craig R. Walters; Charles F. Webb

The IBM zEnterprise® 196 achieves substantial performance gains over prior designs across the full spectrum of workloads being run on todays enterprise information technology systems, ranging from large data-intensive transaction processing workloads to central processing unit-intensive business applications. Each of these required innovations in the design of the hardware, software, and instruction-set architecture (ISA) with performance gains coming from several sources: the out-of-order microprocessor core design, the multilevel cache structure, new ISA facilities, and additional ISA extensions to enable efficient scaling of large multiprocessor operating system images. These enhancements were achieved through collaborative development among hardware, software, compiler, architecture, and performance analysis teams. This paper describes the performance contributions from these sources, with particular focus on the new architectural facilities.

Archive | 2001