Guenter Gerwig | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Guenter Gerwig is active.

Explore More

Publication

Featured researches published by Guenter Gerwig.

Ibm Journal of Research and Development | 2009

Design and microarchitecture of the IBM system z10 microprocessor

Chung-Lung Kevin Shum; Fadi Y. Busaba; S. Dao-Trong; Guenter Gerwig; Christian Jacobi; Thomas Koehler; E. Pfeffer; Brian R. Prasky; J. G. Rell; Aaron Tsai

The IBM System z10™ microprocessor is currently the fastest running 64-bit CISC (complex instruction set computer) microprocessor. This microprocessor operates at 4.4 GHz and provides up to two times performance improvement compared with its predecessor, the System z9® microprocessor. In addition to its ultrahigh-frequency pipeline, the z10™ microprocessor offers such performance enhancements as a sophisticated branch-prediction structure, a large second-level private cache, a data-prefetch engine, and a hardwired decimal floating-point arithmetic unit. The z10 microprocessor also implements new architectural features that allow better software optimization across compiled applications. These features include new instructions that help shorten the code path lengths and new facilities for software-directed cache management and the use of 1-MB virtual pages. The innovative microarchitecture of the z10 microprocessor and notable differences from its predecessors and the IBM POWER6™ microprocessor are discussed.

Ibm Journal of Research and Development | 2004

The IBM eServer z990 floating-point unit

Guenter Gerwig; Holger Wetter; Eric M. Schwarz; Juergen Haess; Christopher A. Krygowski; Bruce M. Fleischer; Michael Kroener

The floating-point unit (FPU) of the IBM z990 eServerTM is the first one in an IBM mainframe with a fused multiply-add dataflow. It also represents the first time that an SRT divide algorithm (named after Sweeney, Robertson, and Tocher, who independently proposed the algorithm) was used in an IBM mainframe. The FPU supports dual architectures: the zSeries® hexadecimal floating-point architecture and the IEEE 754 binary floating-point architecture. Six floating-point formats-- including short, long, and extended operands-are supported in hardware. The throughput of this FPU is one multiply-add operation per cycle. The instructions are executed in five pipeline steps, and there are multiple provisions to avoid stalls in case of data dependencies. It is able to handle denormalized input operands and denormalized results without a stall (except for architectural program exceptions). It has a new extended-precision divide and square-root dataflow. This dataflow uses a radix-4 SRT algorithm (radix-2 for square root) and is able to handle divides and square-root operations in multiple floating-point and fixed-point formats. For fixed-point divisions, a new mechanism improves the performance by using an algorithm with which the number of divide iterations depends on the effective number of quotient bits.

symposium on computer arithmetic | 2003

High performance floating-point unit with 116 bit wide divider

Guenter Gerwig; Holger Wetter; Eric M. Schwarz; Juergen Haess

The next generation zSeries floating-point unit is unveiled which is the first IBM mainframe with a fused multiply-add dataflow. It supports both S/390 hexadecimal floating-point architecture and the IEEE 754 binary floating-point architecture which was first implemented in S/390 on the 1998 S/390 G5 floating-point unit. The new floating-point unit supports a total of 6 formats including single, double, and quadword formats implemented in hardware. The floating-point pipeline is 5 cycles with a throughput of 1 multiply-add per cycle. Both hexadecimal and binary floating-point instructions are capable of this performance due to a novel way of handling both formats. Other key developments include new methods for handling denormalized numbers and quad precision divide engine dataflow. This divider uses a radix-4 SRT algorithm and is able to handle quad precision divides in multiple floating-point and fixed-point formats. The number of iterations for fixed-point divisions depend on the effective number of quotient bits. It uses a reduced carry-save form for the partial remainder, with only 1 carry bit for every 4 sum bits, to save area and power.

IEEE Journal of Solid-state Circuits | 2012

Circuit and Physical Design Implementation of the Microprocessor Chip for the zEnterprise System

James D. Warnock; Yiu-Hing Chan; Sean M. Carey; Huajun Wen; Patrick J. Meaney; Guenter Gerwig; Howard H. Smith; Yuen H. Chan; John S. Davis; Paul A. Bunce; Antonio R. Pelella; Daniel Rodko; Pradip Patel; Thomas Strach; Doug Malone; Frank Malgioglio; José Luis Neves; David L. Rude; William V. Huott

This paper describes the circuit and physical design features of the z196 processor chip, implemented in a 45 nm SOI technology. The chip contains 4 super-scalar, out-of-order processor cores, running at 5.2 GHz, on a die with an area of 512 mm2 containing an estimated 1.4 billion transistors. The core and chip design methodology and specific design features are presented, focusing on techniques used to enable high-frequency operation. In addition, chip power, IR drop, and supply noise are discussed, being key design focus areas. The chips ground-breaking RAS features are also described, engineered for maximum reliability and system stability.

symposium on computer arithmetic | 1999