Juergen Haess | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Juergen Haess is active.

Explore More

Publication

Featured researches published by Juergen Haess.

Ibm Journal of Research and Development | 2004

The IBM eServer z990 floating-point unit

Guenter Gerwig; Holger Wetter; Eric M. Schwarz; Juergen Haess; Christopher A. Krygowski; Bruce M. Fleischer; Michael Kroener

The floating-point unit (FPU) of the IBM z990 eServerTM is the first one in an IBM mainframe with a fused multiply-add dataflow. It also represents the first time that an SRT divide algorithm (named after Sweeney, Robertson, and Tocher, who independently proposed the algorithm) was used in an IBM mainframe. The FPU supports dual architectures: the zSeries® hexadecimal floating-point architecture and the IEEE 754 binary floating-point architecture. Six floating-point formats-- including short, long, and extended operands-are supported in hardware. The throughput of this FPU is one multiply-add operation per cycle. The instructions are executed in five pipeline steps, and there are multiple provisions to avoid stalls in case of data dependencies. It is able to handle denormalized input operands and denormalized results without a stall (except for architectural program exceptions). It has a new extended-precision divide and square-root dataflow. This dataflow uses a radix-4 SRT algorithm (radix-2 for square root) and is able to handle divides and square-root operations in multiple floating-point and fixed-point formats. For fixed-point divisions, a new mechanism improves the performance by using an algorithm with which the number of divide iterations depends on the effective number of quotient bits.

symposium on computer arithmetic | 2003

High performance floating-point unit with 116 bit wide divider

Guenter Gerwig; Holger Wetter; Eric M. Schwarz; Juergen Haess

The next generation zSeries floating-point unit is unveiled which is the first IBM mainframe with a fused multiply-add dataflow. It supports both S/390 hexadecimal floating-point architecture and the IEEE 754 binary floating-point architecture which was first implemented in S/390 on the 1998 S/390 G5 floating-point unit. The new floating-point unit supports a total of 6 formats including single, double, and quadword formats implemented in hardware. The floating-point pipeline is 5 cycles with a throughput of 1 multiply-add per cycle. Both hexadecimal and binary floating-point instructions are capable of this performance due to a novel way of handling both formats. Other key developments include new methods for handling denormalized numbers and quad precision divide engine dataflow. This divider uses a radix-4 SRT algorithm and is able to handle quad precision divides in multiple floating-point and fixed-point formats. The number of iterations for fixed-point divisions depend on the effective number of quotient bits. It uses a reduced carry-save form for the partial remainder, with only 1 carry bit for every 4 sum bits, to save area and power.

Ibm Journal of Research and Development | 2004

The structure of chips and links comprising the IBM eServer z990 I/O subsystem

Edward W. Chencinski; Michael J. Becht; Tim E. Bubb; Carolynn G. Burwick; Juergen Haess; Markus M. Helms; Joseph M. Hoke; Thomas Schlipf; Jeffrey M. Turner; Hartmut Ulland; Manfred Walz; Carl H. Whitehead; Gerhard Zilles

The performance of large servers is to a high degree determined by their I/O subsystems. In the z990 server, nearly all of the components in the I/O path have been considerably improved in performance, capability, and cost. A 2-GB/s enhanced self-timed interface (eSTI) was introduced which is capable of absorbing the ever-increasing data rates of modern high-speed adapters. The I/O bandwidth available from a single node (three memory bus adapter, or MBA, chips, each with four eSTI ports) now equals 48 GB/s. As a consequence, both the MBA chip and the STI multiplexer switch (STI switch) chip had to be completely redesigned. In addition to these two chips, this paper describes the eSTI design itself and the Sweep chip, which integrates the function of four bidirectional adapter chips, one switch chip, and a clock chip.

Archive | 1993