Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Juergen Haess is active.

Publication


Featured researches published by Juergen Haess.


Ibm Journal of Research and Development | 2004

The IBM eServer z990 floating-point unit

Guenter Gerwig; Holger Wetter; Eric M. Schwarz; Juergen Haess; Christopher A. Krygowski; Bruce M. Fleischer; Michael Kroener

The floating-point unit (FPU) of the IBM z990 eServerTM is the first one in an IBM mainframe with a fused multiply-add dataflow. It also represents the first time that an SRT divide algorithm (named after Sweeney, Robertson, and Tocher, who independently proposed the algorithm) was used in an IBM mainframe. The FPU supports dual architectures: the zSeries® hexadecimal floating-point architecture and the IEEE 754 binary floating-point architecture. Six floating-point formats-- including short, long, and extended operands-are supported in hardware. The throughput of this FPU is one multiply-add operation per cycle. The instructions are executed in five pipeline steps, and there are multiple provisions to avoid stalls in case of data dependencies. It is able to handle denormalized input operands and denormalized results without a stall (except for architectural program exceptions). It has a new extended-precision divide and square-root dataflow. This dataflow uses a radix-4 SRT algorithm (radix-2 for square root) and is able to handle divides and square-root operations in multiple floating-point and fixed-point formats. For fixed-point divisions, a new mechanism improves the performance by using an algorithm with which the number of divide iterations depends on the effective number of quotient bits.


symposium on computer arithmetic | 2003

High performance floating-point unit with 116 bit wide divider

Guenter Gerwig; Holger Wetter; Eric M. Schwarz; Juergen Haess

The next generation zSeries floating-point unit is unveiled which is the first IBM mainframe with a fused multiply-add dataflow. It supports both S/390 hexadecimal floating-point architecture and the IEEE 754 binary floating-point architecture which was first implemented in S/390 on the 1998 S/390 G5 floating-point unit. The new floating-point unit supports a total of 6 formats including single, double, and quadword formats implemented in hardware. The floating-point pipeline is 5 cycles with a throughput of 1 multiply-add per cycle. Both hexadecimal and binary floating-point instructions are capable of this performance due to a novel way of handling both formats. Other key developments include new methods for handling denormalized numbers and quad precision divide engine dataflow. This divider uses a radix-4 SRT algorithm and is able to handle quad precision divides in multiple floating-point and fixed-point formats. The number of iterations for fixed-point divisions depend on the effective number of quotient bits. It uses a reduced carry-save form for the partial remainder, with only 1 carry bit for every 4 sum bits, to save area and power.


Ibm Journal of Research and Development | 2004

The structure of chips and links comprising the IBM eServer z990 I/O subsystem

Edward W. Chencinski; Michael J. Becht; Tim E. Bubb; Carolynn G. Burwick; Juergen Haess; Markus M. Helms; Joseph M. Hoke; Thomas Schlipf; Jeffrey M. Turner; Hartmut Ulland; Manfred Walz; Carl H. Whitehead; Gerhard Zilles

The performance of large servers is to a high degree determined by their I/O subsystems. In the z990 server, nearly all of the components in the I/O path have been considerably improved in performance, capability, and cost. A 2-GB/s enhanced self-timed interface (eSTI) was introduced which is capable of absorbing the ever-increasing data rates of modern high-speed adapters. The I/O bandwidth available from a single node (three memory bus adapter, or MBA, chips, each with four eSTI ports) now equals 48 GB/s. As a consequence, both the MBA chip and the STI multiplexer switch (STI switch) chip had to be completely redesigned. In addition to these two chips, this paper describes the eSTI design itself and the Sweep chip, which integrates the function of four bidirectional adapter chips, one switch chip, and a clock chip.


Archive | 1993

System for transferring data between asynchronous data buses with a data buffer interposed in between the buses for synchronization of devices timed by different clocks

Juergen Haess; Rolf Hilgendorf


Archive | 2005

Floating point unit with fused multiply add and method for calculating a result with a floating point unit

Son Dao Trong; Juergen Haess; Christian Jacobi; Klaus Michael Kroener; Silvia Melitta Mueller; Jochen Preiss


Archive | 2003

High-sticky calculation in pipelined fused multiply/add circuitry

Guenter Gerwig; Juergen Haess; Klaus Michael Kroener


Archive | 2005

System and method for a floating point unit with feedback prior to normalization and rounding

Bruce M. Fleischer; Juergen Haess; Michael Kroener; Martin S. Schmookler; Eric M. Schwarz; Son Dao-Trong


Archive | 1999

Examination of residues of data-conversions

Guenter Gerwig; Juergen Haess; Michael Kroener; Erwin Pfeffer


Archive | 2014

RESIDUE-BASED ERROR DETECTION FOR A PROCESSOR EXECUTION UNIT THAT SUPPORTS VECTOR OPERATIONS

Maarten Jakob Boersma; Juergen Haess


Archive | 2012

DYNAMIC HARDWARE TRACE SUPPORTING MULTIPHASE OPERATIONS

Steven R. Carlough; Juergen Haess; Michael Kroener; Silvia Melitta Mueller

Researchain Logo
Decentralizing Knowledge