Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by William J. Starke.
Ibm Journal of Research and Development | 2007
Hung Q. Le; William J. Starke; J. S. Fields; F. P. O'Connell; D. Q. Nguyen; B. J. Ronchetti; Wolfram Sauer; Eric M. Schwarz; Michael Thomas Vaden
This paper describes the implementation of the IBM POWER6™ microprocessor, a two-way simultaneous multithreaded (SMT) dual-core chip whose key features include binary compatibility with IBM POWER5™ microprocessor-based systems; increased functional capabilities, such as decimal floating-point and vector multimedia extensions; significant reliability, availability, and serviceability enhancements; and robust scalability with up to 64 physical processors. Based on a new industry-leading high-frequency core architecture with enhanced SMT and driven by a high-throughput symmetric multiprocessing (SMP) cache and memory subsystem, the POWER6 chip achieves a significant performance boost compared with its predecessor, the POWER5 chip. Key extensions to the coherence protocol enable POWER6 microprocessor-based systems to achieve better SMP scalability while enabling reductions in system packaging complexity and cost.
international symposium on microarchitecture | 2010
Ronald Nick Kalla; Balaram Sinharoy; William J. Starke; Michael Stephen Floyd
Power Systems™ continue strong 7th Generation Power chip: Balanced Multi-Core design EDRAM technology SMT4 Greater then 4X performance in same power envelope as previous generation. Scales to 32 socket, 1024 threads balanced system. Building block for peta-scale PERCS project POWER7 Systems Running in Lab AIX®, IBM i, Linux® all operational.
Ibm Journal of Research and Development | 2011
Balaram Sinharoy; Ronald Nick Kalla; William J. Starke; Hung Q. Le; R. Cargnoni; J. A. Van Norstrand; B. J. Ronchetti; Jeffrey A. Stuecheli; Jens Leenstra; G. L. Guthrie; D. Q. Nguyen; Bart Blaner; C. F. Marino; E. Retter; Peter Williams
The IBM POWER® processor is the dominant reduced instruction set computing microprocessor in the world today, with a rich history of implementation and innovation over the last 20 years. In this paper, we describe the key features of the POWER7® processor chip. On the chip is an eight-core processor, with each core capable of four-way simultaneous multithreaded operation. Fabricated in IBMs 45-nm silicon-on-insulator (SOI) technology with 11 levels of metal, the chip contains more than one billion transistors. The processor core and caches are significantly enhanced to boost the performance of both single-threaded response-time-oriented, as well as multithreaded, throughput-oriented applications. The memory subsystem contains three levels of on-chip cache, with SOI embedded dynamic random access memory (DRAM) devices used as the last level of cache. A new memory interface using buffered double-data-rate-three DRAM and improvements in reliability, availability, and serviceability are discussed
international solid-state circuits conference | 2010
Dieter Wendel; Ronald Nick Kalla; Robert Cargoni; Joachim Clables; Joshua Friedrich; Roland Frech; James Allan Kahle; Balaram Sinharoy; William J. Starke; Scott A. Taylor; Steve Weitzel; Sam Gat-Shang Chu; Saiful Islam; Victor Zyuban
The next processor of the POWER ™ family, called POWER7™ is introduced. Eight quad-threaded cores are integrated together with two memory controllers and high-speed system links on a 567mm2 die, employing 1.2B transistors in 45nm CMOS SOI technology [4]. High on-chip performance and therefore bandwidth is achieved using 11 layers of low-к copper wiring and devices with enhanced dual-stress liners. The technology features deep trench [DT] capacitors that are used to build the 32MB embedded DRAM L3 based on a 0.067µm2 DRAM cell. DT capacitors are used also to reduce on-chip voltage-island supply noise. Focusing on speed, the dual-supply ripple-domino SRAM concepts follows the schemes described elsewhere.
Ibm Journal of Research and Development | 2015
William J. Starke; Jeffrey A. Stuecheli; David Daly; John Steven Dodson; Florian A. Auernhammer; Patricia M. Sagmeister; Guy Lynn Guthrie; Charles F. Marino; Michael S. Siegel; Bart Blaner
In this paper, we describe the IBM POWER8™ cache, interconnect, memory, and input/output subsystems, collectively referred to as the “nest.” This paper focuses on the enhancements made to the nest to achieve balanced and scalable designs, ranging from small 12-core single-socket systems, up to large 16-processor-socket, 192-core enterprise rack servers. A key aspect of the design has been increasing the end-to-end data and coherence bandwidth of the system, now featuring more than twice the bandwidth of the POWER7® processor. The paper describes the new memory-buffer chip, called Centaur, providing up to 128 MB of eDRAM (embedded dynamic random-access memory) buffer cache per processor, along with an improved DRAM (dynamic random-access memory) scheduler with support for prefetch and write optimizations, providing industry-leading memory bandwidth combined with low memory latency. It also describes new coherence-transport enhancements and the transition to directly integrated PCIe® (PCI Express®) support, as well as additions to the cache subsystem to support higher levels of virtualization and scalability including snoop filtering and cache sharing.
Ibm Journal of Research and Development | 2015
Hung Q. Le; Guy Lynn Guthrie; Derek Edward Williams; Maged M. Michael; Brad Frey; William J. Starke; Cathy May; Rei Odaira; Takuya Nakaike
With multi-core processors, parallel programming has taken on greater importance. Traditional parallel programming techniques based on critical sections controlled by locking have several well-known drawbacks. To allow for more efficient parallel programming with higher performance, the IBM POWER8™ processor implements a hardware transactional memory facility. Transactional memory allows groups of load and store operations to execute and commit as a single atomic unit without the use of traditional locks, thereby improving performance and simplifying the parallel programming model. The POWER8 transactional memory facility provides a robust capability to execute transactions that can survive interrupts. It also allows non-speculative accesses within transactions, which facilitates debugging and thread-level speculation. Unique challenges caused by implementing transactional memory on top of the Power ISA (Instruction Set Architecture) weakly consistent memory model are addressed. We detail the Power ISA transactional memory architecture, the POWER8 implementation of this architecture, and two practical uses of this architecture—Transactional Lock Elision (TLE) and Thread-Level Speculation (TLS)—and provide performance results for these uses.
international conference on ic design and technology | 2014
Joshua Friedrich; Hung Q. Le; William J. Starke; Jeff Stuechli; Balaram Sinharoy; Eric Fluhr; Daniel M. Dreps; Victor Zyuban; Gregory Scott Still; Christopher J. Gonzalez; David Hogenmiller; Frank Malgioglio; Ryan Nett; Ruchir Puri; Phillip J. Restle; David Shan; Zeynep Toprak Deniz; Dieter Wendel; Matthew M. Ziegler; Dave Victor
POWER8™ delivers a data-optimized design suited for analytics, cognitive workloads, and todays exploding data sizes. The design point results in a 2.5x performance gain over its predecessor, POWER7+™, for many workloads. In addition, POWER8 delivers the efficiency demanded by cloud computing models and also represents a first step toward creating an open ecosystem for server innovation.
ieee hot chips symposium | 2009
William J. Starke
Presents a collection of slides covering the following topics: Extreme multicore throughput; SMP scaling; 8-core high performance server chip; virtual machine; cloud computing; server evolution; cache hierarchy technology; memory subsystem; and off-chip signaling technology.
international conference on ic design and technology | 2010
Dieter Wendel; Ronald Nick Kalla; Joshua Friedrich; James Allan Kahle; Jens Leenstra; Cedric Lichtenau; Balaram Sinharoy; William J. Starke; Victor Zyuban
Introducing POWER7TM the latest member of the IBM POWERTM processor family. A 567mm² chip implemented in 45nm SOI technology, holding eight quad threaded cores, a 32MB shared eDRAM L3, two memory controllers and high bandwidth SMP interfaces. The new out of order, shallow pipeline core with 12 execution units, multiport L1 caches and a private 256kB L2 offers the efficiency to support 4x the number of cores within the same power envelope as its predecessor. Supporting over 4GHz, the L1 data cache loop is kept to 2 cycles. Data from the L2 can be returned to the core at a rate of 32B per cycle.
IEEE Micro | 2017
Satish Kumar Sadasivam; Brian W. Thompto; Ronald Nick Kalla; William J. Starke
The IBM Power9 processor has an enhanced core and chip architecture that provides superior thread performance and higher throughput. The core and chip architectures are optimized for emerging workloads to support the needs of next-generation computing. Multiple variants of silicon target the scale-out and scale-up markets. With a new core microarchitecture design, along with an innovative I/O fabric to support several accelerated computing requirements, the Power9 processor meets the diverse computing needs of the cognitive era and provides a platform for accelerated computing.