Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Pak-Kin Mak is active.

Publication


Featured researches published by Pak-Kin Mak.


Ibm Journal of Research and Development | 2009

IBM system z10 processor cache subsystem microarchitecture

Pak-Kin Mak; Craig R. Walters; Gary E. Strait

With the introduction of the high-frequency IBM System z10™ processor design, a new, robust cache hierarchy was needed to enable up to 80 of these processors aggregated into a tightly coupled symmetric multiprocessor (SMP) system to reach their performance potential. Typically, each time the processor frequency increases by a significant factor, as did the z10™ processor over the predecessor IBM System z9® processor, the access time of data, as measured by the number of processor cycles beyond the level 1 cache on an identical processor cache subsystem, would increase proportionally as well because the flight time on the chip interconnects across multiple hardware packaging levels has stayed relatively constant in nanoseconds. To address the latency scaling problem and the increased demand of the larger 80-way SMP size, the z10 processor cache subsystem introduces new innovative concepts and solutions.


Ibm Journal of Research and Development | 2012

IBM zEnterprise 196 microprocessor and cache subsystem

Fadi Y. Busaba; Michael A. Blake; Brian W. Curran; Michael Fee; Christian Jacobi; Pak-Kin Mak; Brian R. Prasky; Craig R. Walters

The IBM zEnterprise® 196 (z196) system, announced in the second quarter of 2010, is the latest generation of the IBM System z® mainframe. The system is designed with a new microprocessor and memory subsystems, which distinguishes it from its z10® predecessor. The system has up to 40% improvement in performance for traditional z/OS® workloads and carries up to 60% more capacity when compared with its z10 predecessor. The memory subsystem has four levels of cache hierarchy (L1 through L4) and constructs the L3 and L4 caches with embedded DRAM silicon technology, which achieves approximately three times the cache density over traditional static RAM technology. The microprocessor has 50% more decode and dispatch bandwidth when compared with the z10 microprocessor, as well as an out-of-order design that can issue and execute up to five instructions every single cycle. The microprocessor has an advanced branch prediction structure and employs enhanced store queue management algorithms. At the date of product announcement, the microprocessor was the fastest complex-instruction-set computing processor in the industry, running at a sustained 5.2 GHz, executing approximately 1,100 instructions, 220 of which are cracked into reduced-instruction-set computing-type operations, to achieve large performance gains in legacy online transaction processing and compute-intensive workloads.


international solid-state circuits conference | 2011

A 5.2GHz microprocessor chip for the IBM zEnterprise™ system

James D. Warnock; Yuen Chan; William V. Huott; Sean M. Carey; Michael Fee; Huajun Wen; M. J. Saccamango; Frank Malgioglio; Patrick J. Meaney; Donald W. Plass; Yuen H. Chan; Mark D. Mayo; Guenter Mayer; Leon J. Sigal; David L. Rude; Robert M. Averill; Michael H. Wood; Thomas Strach; Howard H. Smith; Brian W. Curran; Eric M. Schwarz; Lee Evan Eisen; Doug Malone; Steve Weitzel; Pak-Kin Mak; Thomas J. McPherson; Charles F. Webb

The microprocessor chip for the IBM zEnterprise 196 (z 196) system is a high-frequency, high-performance design that adds support for out-of-order instruction execution and increases operating frequency by almost 20% compared to the previous 65nm design, while still fitting within the same power envelope. Despite the many difficult engineering hurdles to be overcome, the design team was able to achieve a product frequency of 5.2GHz, providing a significant performance boost for the new system.


international symposium on microarchitecture | 2011

The zEnterprise 196 System and Microprocessor

Brian W. Curran; Lee Evan Eisen; Eric M. Schwarz; Pak-Kin Mak; James D. Warnock; Patrick J. Meaney; Michael Fee

The zEnterprise 196 is the latest IBM System zSeries mainframe computer, which builds on IBMs 46-year heritage of compatible enterprise-class machines. This design advances the prior z10 processor pipeline with out-of-order execution to achieve considerable performance gains in legacy online transaction processing and computationally intensive workloads. This article describes the system structure and details of this new high-frequency microprocessor.


Ibm Journal of Research and Development | 1997

Shared-cache clusters in a system with a fully shared memory

Pak-Kin Mak; Michael A. Blake; Christine C. Jones; Gary E. Strait; Paul R. Turgeon

Interest in the concept of clustered caches has been growing in recent years. The advantages of sharing data and instruction streams among two or more microprocessors are understood; however, clustering also introduces new challenges in cache and memory coherency when system design requirements indicate that two or more of these clusters are needed. This paper describes the shared L2 cache cluster design found in the S/390® G4 server. This novel cache design consists of multiple shared-cache clusters, each supporting up to three microprocessors, forming a tightly coupled symmetric multiprocessor with fully coherent caches and main memory. Because this cache provides the link between an existing S/390 system bus and the new, high-performance S/390 G4 microprocessor chips, the paper addresses the challenges unique to operating shared caches on a common system bus.


IEEE Journal of Solid-state Circuits | 2014

Circuit and Physical Design of the zEnterprise™ EC12 Microprocessor Chips and Multi-Chip Module

James D. Warnock; Yuen H. Chan; Hubert Harrer; Sean M. Carey; Gerard M. Salem; Doug Malone; Ruchir Puri; Jeffrey A. Zitz; Adam R. Jatkowski; Gerald Strevig; Ayan Datta; Anne E. Gattiker; Aditya Bansal; Guenter Mayer; Yiu-Hing Chan; Mark D. Mayo; David L. Rude; Leon J. Sigal; Thomas Strach; Howard H. Smith; Huajun Wen; Pak-Kin Mak; Chung-Lung Kevin Shum; Donald W. Plass; Charles F. Webb

This work describes the circuit and physical design implementation of the processor chip (CP), level-4 cache chip (SC), and the multi-chip module at the heart of the EC12 system. The chips were implemented in IBMs high-performance 32nm high-k/metal-gate SOI technology. The CP chip contains 6 super-scalar, out-of-order processor cores, running at 5.5 GHz, while the SC chip contains 192 MB of eDRAM cache. Six CP chips and two SC chips are mounted on a high-performance glass-ceramic substrate, which provides high-bandwidth, low-latency interconnections. Various aspects of the design are explored in detail, with most of the focus on the CP chip, including the circuit design implementation, clocking, thermal modeling, reliability, frequency tuning, and comparison to the previous design in 45nm technology.


Ibm Journal of Research and Development | 1999

The S/390 G5/G6 binodal cache

Paul R. Turgeon; Pak-Kin Mak; Michael A. Blake; Michael Fee; C. B. Ford; Patrick J. Meaney; R. Seigler; W. W. Shen

The IBM S/390® fifth-generation CMOS-based server (more commonly known as the G5) produced a dramatic improvement in system-level performance in comparison with its predecessor, the G4. Much of this improvement can be attributed to an innovative approach to the cache and memory hierarchy: the binodal cache architecture. This design features shared caching and very high sustainable bandwidths at all points in the system. It contains several innovations in managing shared data, in maintaining high bandwidths at critical points in the system, and in sustaining high performance with unparalleled fault tolerance and recovery capabilities. This paper addresses several of these key features as they are implemented in the S/390 G5 server and its successor, the S/390 G6 server.


international solid-state circuits conference | 2015

4.1 22nm Next-generation IBM System z microprocessor

James D. Warnock; Brian W. Curran; John Badar; Gregory J. Fredeman; Donald W. Plass; Yuen H. Chan; Sean M. Carey; Gerard M. Salem; Friedrich Schroeder; Frank Malgioglio; Guenter Mayer; Christopher J. Berry; Michael H. Wood; Yiu-Hing Chan; Mark D. Mayo; John Mack Isakson; Charudhattan Nagarajan; Tobias Werner; Leon J. Sigal; Ricardo H. Nigaglioni; Mark Cichanowski; Jeffrey A. Zitz; Matthew M. Ziegler; Tim Bronson; Gerald Strevig; Daniel M. Dreps; Ruchir Puri; Douglas J. Malone; Dieter Wendel; Pak-Kin Mak

The next-generation System z design introduces a new microprocessor chip (CP) and a system controller chip (SC) aimed at providing a substantial boost to maximum system capacity and performance compared to the previous zEC12 design in 32nm [1,2]. As shown in the die photo, the CP chip includes 8 high-frequency processor cores, 64MB of eDRAM L3 cache, interface IOs (“XBUS”) to connect to two other processor chips and the L4 cache chip, along with memory interfaces, 2 PCIe Gen3 interfaces, and an I/O bus controller (GX). The design is implemented on a 678 mm2 die with 4.0 billion transistors and 17 levels of metal interconnect in IBMs high-performance 22nm high-x CMOS SOI technology [3]. The SC chip is also a 678 mm2 die, with 7.1 billion transistors, running at half the clock frequency of the CP chip, in the same 22nm technology, but with 15 levels of metal. It provides 480 MB of eDRAM L4 cache, an increase of more than 2× from zEC12 [1,2], and contains an 18 MB eDRAM L4 directory, along with multi-processor cache control/coherency logic to manage inter-processor and system-level communications. Both the CP and SC chips incorporate significant logical, physical, and electrical design innovations.


international solid-state circuits conference | 2013

5.5GHz system z microprocessor and multi-chip module

James D. Warnock; Yuen H. Chan; Hubert Harrer; David L. Rude; Ruchir Puri; Sean M. Carey; Gerard M. Salem; Guenter Mayer; Yiu-Hing Chan; Mark D. Mayo; Adam R. Jatkowski; Gerald Strevig; Leon J. Sigal; Ayan Datta; Anne E. Gattiker; Aditya Bansal; Doug Malone; Thomas Strach; Huajun Wen; Pak-Kin Mak; Chung-Lung Kevin Shum; Donald W. Plass; Charles F. Webb

The new System z microprocessor chip (“CP chip”) features a high-frequency processor core running at 5.5GHz in a 32nm high-κ CMOS technology [1], using 15 levels of metal. This chip is a successor to the 45nm product [2], with significant improvements made to the core and nest (i.e. the logic external to the cores) in order to increase the performance and throughput of the design. Also, special considerations were necessary to ensure robust circuit operation in the high-κ technology used for implementation. As seen in the die photo, the chip contains 6 processor cores (compared to 4 cores in the 45nm version), and a large shared 48MB DRAM L3 cache. Each core includes a pair of data and instruction L2 SRAM caches of 1MB each. In addition, the chip contains a memory control unit (MCU), an I/O bus controller (GX), and two sets of interfaces to the L4 cache chips (also in 32nm technology). The CP chip occupies 598 mm2, contains about 2.75B transistors, and has 1071 signal IOs.


international solid-state circuits conference | 1999

Storage hierarchy to support a 600 MHz G5 S/390 microprocessor

Paul R. Turgeon; Pak-Kin Mak; Donald W. Plass; Michael A. Blake; Michael Fee; M. Fischer; Carl B. Ford; G. Holmes; Kathryn M. Jackson; Christine C. Jones; Kevin W. Kark; Frank Malgioglio; Patrick J. Meaney; E. Pell; W. Scarpero; A.R. Seigler; William Wu Shen; Gary E. Strait; Gary Alan VanHuben; G. Wellwood; A. Zuckerman

Although a microprocessors maximum frequency and internal design are important, the storage hierarchy is the primary reason for the large system performance improvement of the S/390 G5 compared to the G4. The improvement is achieved with an L2 cache, system controller and memory interface clocked at 1/4 the microprocessor frequency.

Researchain Logo
Decentralizing Knowledge