Michael Kroener | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Michael Kroener is active.

Explore More

Publication

Featured researches published by Michael Kroener.

symposium on computer arithmetic | 2007

P6 Binary Floating-Point Unit

Son Dao Trong; Martin S. Schmookler; Eric M. Schwarz; Michael Kroener

The floating point unit of the next generation PowerPC is detailed. It has been tested at over 5 GHz. The design supports an extremely aggressive cycle time of 13 FO4 using a technology independent measure. For most dependent instructions, its fused multiply-add dataflow has only 6 effective pipeline stages. This is nearly equivalent to its predecessor, the Power 5, even though its technology independent frequency has increased over 70%. Overall the frequency has improved over 100%. It achieves this high performance through aggressive feedback paths, circuit design and layout. The pipeline has 7 stages but data may be fed back to dependent operations prior to rounding and complete normalization. Division and square root algorithms are also described which take advantage of high-precision linear approximation hardware for obtaining a reciprocal or reciprocal square root approximation.

Ibm Journal of Research and Development | 2004

The IBM eServer z990 floating-point unit

Guenter Gerwig; Holger Wetter; Eric M. Schwarz; Juergen Haess; Christopher A. Krygowski; Bruce M. Fleischer; Michael Kroener

The floating-point unit (FPU) of the IBM z990 eServerTM is the first one in an IBM mainframe with a fused multiply-add dataflow. It also represents the first time that an SRT divide algorithm (named after Sweeney, Robertson, and Tocher, who independently proposed the algorithm) was used in an IBM mainframe. The FPU supports dual architectures: the zSeries® hexadecimal floating-point architecture and the IEEE 754 binary floating-point architecture. Six floating-point formats-- including short, long, and extended operands-are supported in hardware. The throughput of this FPU is one multiply-add operation per cycle. The instructions are executed in five pipeline steps, and there are multiple provisions to avoid stalls in case of data dependencies. It is able to handle denormalized input operands and denormalized results without a stall (except for architectural program exceptions). It has a new extended-precision divide and square-root dataflow. This dataflow uses a radix-4 SRT algorithm (radix-2 for square root) and is able to handle divides and square-root operations in multiple floating-point and fixed-point formats. For fixed-point divisions, a new mechanism improves the performance by using an algorithm with which the number of divide iterations depends on the effective number of quotient bits.

symposium on computer arithmetic | 2011

The IBM zEnterprise-196 Decimal Floating-Point Accelerator

Steven R. Carlough; Adam B. Collura; Silvia Melitta Mueller; Michael Kroener

Decimal floating-point Arithmetic is widely used in commercial computing applications, such as financial transactions, where rounding errors prevent the use of binary floating-point operations. The revised IEEE Standard for Floating-Point Arithmetic (IEEE-754-2008) defined standardized decimal floating-point (DFP) formats. As more software applications adopt the IEEE decimal floating-point standard, hardware accelerators that support it are becoming more prevalent. This paper describes the second generation decimal floating-point accelerator implemented on the IBM zEnterprise-196 processor. The 4-cycle deep pipeline was designed to optimize the latency of fixed-point decimal operations while significantly improving the bandwidth of DFP operations. A detailed description of the unit and a comparison to previous implementations found in literature is provided.

symposium on computer arithmetic | 1999