Michael A. Blake | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Michael A. Blake is active.

Explore More

Publication

Featured researches published by Michael A. Blake.

Ibm Journal of Research and Development | 2012

IBM zEnterprise 196 microprocessor and cache subsystem

Fadi Y. Busaba; Michael A. Blake; Brian W. Curran; Michael Fee; Christian Jacobi; Pak-Kin Mak; Brian R. Prasky; Craig R. Walters

The IBM zEnterprise® 196 (z196) system, announced in the second quarter of 2010, is the latest generation of the IBM System z® mainframe. The system is designed with a new microprocessor and memory subsystems, which distinguishes it from its z10® predecessor. The system has up to 40% improvement in performance for traditional z/OS® workloads and carries up to 60% more capacity when compared with its z10 predecessor. The memory subsystem has four levels of cache hierarchy (L1 through L4) and constructs the L3 and L4 caches with embedded DRAM silicon technology, which achieves approximately three times the cache density over traditional static RAM technology. The microprocessor has 50% more decode and dispatch bandwidth when compared with the z10 microprocessor, as well as an out-of-order design that can issue and execute up to five instructions every single cycle. The microprocessor has an advanced branch prediction structure and employs enhanced store queue management algorithms. At the date of product announcement, the microprocessor was the fastest complex-instruction-set computing processor in the industry, running at a sustained 5.2 GHz, executing approximately 1,100 instructions, 220 of which are cracked into reduced-instruction-set computing-type operations, to achieve large performance gains in legacy online transaction processing and compute-intensive workloads.

Ibm Journal of Research and Development | 1997

Shared-cache clusters in a system with a fully shared memory

Pak-Kin Mak; Michael A. Blake; Christine C. Jones; Gary E. Strait; Paul R. Turgeon

Interest in the concept of clustered caches has been growing in recent years. The advantages of sharing data and instruction streams among two or more microprocessors are understood; however, clustering also introduces new challenges in cache and memory coherency when system design requirements indicate that two or more of these clusters are needed. This paper describes the shared L2 cache cluster design found in the S/390® G4 server. This novel cache design consists of multiple shared-cache clusters, each supporting up to three microprocessors, forming a tightly coupled symmetric multiprocessor with fully coherent caches and main memory. Because this cache provides the link between an existing S/390 system bus and the new, high-performance S/390 G4 microprocessor chips, the paper addresses the challenges unique to operating shared caches on a common system bus.

Ibm Journal of Research and Development | 1999

The S/390 G5/G6 binodal cache

Paul R. Turgeon; Pak-Kin Mak; Michael A. Blake; Michael Fee; C. B. Ford; Patrick J. Meaney; R. Seigler; W. W. Shen

The IBM S/390® fifth-generation CMOS-based server (more commonly known as the G5) produced a dramatic improvement in system-level performance in comparison with its predecessor, the G4. Much of this improvement can be attributed to an innovative approach to the cache and memory hierarchy: the binodal cache architecture. This design features shared caching and very high sustainable bandwidths at all points in the system. It contains several innovations in managing shared data, in maintaining high bandwidths at critical points in the system, and in sustaining high performance with unparalleled fault tolerance and recovery capabilities. This paper addresses several of these key features as they are implemented in the S/390 G5 server and its successor, the S/390 G6 server.

international solid-state circuits conference | 2015

4.1 22nm Next-generation IBM System z microprocessor

James D. Warnock; Brian W. Curran; John Badar; Gregory J. Fredeman; Donald W. Plass; Yuen H. Chan; Sean M. Carey; Gerard M. Salem; Friedrich Schroeder; Frank Malgioglio; Guenter Mayer; Christopher J. Berry; Michael H. Wood; Yiu-Hing Chan; Mark D. Mayo; John Mack Isakson; Charudhattan Nagarajan; Tobias Werner; Leon J. Sigal; Ricardo H. Nigaglioni; Mark Cichanowski; Jeffrey A. Zitz; Matthew M. Ziegler; Tim Bronson; Gerald Strevig; Daniel M. Dreps; Ruchir Puri; Douglas J. Malone; Dieter Wendel; Pak-Kin Mak

The next-generation System z design introduces a new microprocessor chip (CP) and a system controller chip (SC) aimed at providing a substantial boost to maximum system capacity and performance compared to the previous zEC12 design in 32nm [1,2]. As shown in the die photo, the CP chip includes 8 high-frequency processor cores, 64MB of eDRAM L3 cache, interface IOs (“XBUS”) to connect to two other processor chips and the L4 cache chip, along with memory interfaces, 2 PCIe Gen3 interfaces, and an I/O bus controller (GX). The design is implemented on a 678 mm2 die with 4.0 billion transistors and 17 levels of metal interconnect in IBMs high-performance 22nm high-x CMOS SOI technology [3]. The SC chip is also a 678 mm2 die, with 7.1 billion transistors, running at half the clock frequency of the CP chip, in the same 22nm technology, but with 15 levels of metal. It provides 480 MB of eDRAM L4 cache, an increase of more than 2× from zEC12 [1,2], and contains an 18 MB eDRAM L4 directory, along with multi-processor cache control/coherency logic to manage inter-processor and system-level communications. Both the CP and SC chips incorporate significant logical, physical, and electrical design innovations.

international solid-state circuits conference | 1999

Storage hierarchy to support a 600 MHz G5 S/390 microprocessor

Paul R. Turgeon; Pak-Kin Mak; Donald W. Plass; Michael A. Blake; Michael Fee; M. Fischer; Carl B. Ford; G. Holmes; Kathryn M. Jackson; Christine C. Jones; Kevin W. Kark; Frank Malgioglio; Patrick J. Meaney; E. Pell; W. Scarpero; A.R. Seigler; William Wu Shen; Gary E. Strait; Gary Alan VanHuben; G. Wellwood; A. Zuckerman

Although a microprocessors maximum frequency and internal design are important, the storage hierarchy is the primary reason for the large system performance improvement of the S/390 G5 compared to the G4. The improvement is achieved with an L2 cache, system controller and memory interface clocked at 1/4 the microprocessor frequency.

Ibm Journal of Research and Development | 2015

The IBM z13 processor cache subsystem

Craig R. Walters; Pak-Kin Mak; Deanna Postles Dunn Berger; Michael A. Blake; Tim Bronson; Kenneth D. Klapproth; Arthur J. O'Neill; Robert J. Sonnelitter; Vesselina K. Papazova

The IBM z13™ system introduces many new innovative concepts in building a high-performance modular and scalable symmetrical multiprocessing (SMP) system, comprising up to 192 multithreaded processors that span eight system processing nodes. The z13 uses new socket packaging technology, changing from multichip modules (MCMs) to single-chip modules (SCMs). This enables the modularity and scalability of a large distributed SMP system and led to the development of new techniques in several important performance areas. For the cache hierarchy, the inclusivity management policy is optimized between the third-level and the fourth-level shared caches to improve overall cache-bit efficiency, effectively making the fourth-level cache larger to reduce the impact of increased chip socket-to-socket access latencies. The system bus management is enhanced such that multiple data transfers can be simultaneously overlapped on an interface to reduce wait times on critical data when these buses are highly utilized. With the amount of caches on both the Central Processor (CP) and System Controller (SC) chips, several major improvements were made for array macro resiliency to improve overall system availability. These and other major design updates in the latest mainframe processor cache subsystem are described in this paper.

Archive | 2000