Eric M. Schwarz | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Eric M. Schwarz is active.

Explore More

Publication

Featured researches published by Eric M. Schwarz.

international symposium on microarchitecture | 1999

IBM's S/390 G5 microprocessor design

Timothy J. Slegel; Robert M. Averill; Mark A. Check; Bruce C. Giamei; Barry Watson Krumm; Christopher A. Krygowski; Wen H. Li; John Stephen Liptay; John Macdougall; Thomas J. McPherson; Jennifer A. Navarro; Eric M. Schwarz; Kevin Shum; Charles F. Webb

The IBM S/390 G5 microprocessor in IBMs newest CMOS mainframe system provides more than twice the performance of the previous generation, the G4. The G5 system offers improved reliability and availability, along with new architectural features such as support for IEEE floating-point arithmetic and a redesigned L2 cache and processor interconnect. The G5 system implements the ESA/390 instruction-set architecture, which is based on and compatible with the original S/360 architecture. Therefore, it has no RISC (reduced-instruction-set computing) concepts and is one of the most complex of all CISC (complex-instruction-set computing) architectures. Designers had to meet a unique set of challenges to achieve the G5s level of performance-for example, achieving a very high frequency given the complexity of the architecture.

Ibm Journal of Research and Development | 2007

IBM POWER6 microarchitecture

Hung Q. Le; William J. Starke; J. S. Fields; F. P. O'Connell; D. Q. Nguyen; B. J. Ronchetti; Wolfram Sauer; Eric M. Schwarz; Michael Thomas Vaden

This paper describes the implementation of the IBM POWER6™ microprocessor, a two-way simultaneous multithreaded (SMT) dual-core chip whose key features include binary compatibility with IBM POWER5™ microprocessor-based systems; increased functional capabilities, such as decimal floating-point and vector multimedia extensions; significant reliability, availability, and serviceability enhancements; and robust scalability with up to 64 physical processors. Based on a new industry-leading high-frequency core architecture with enhanced SMT and driven by a high-throughput symmetric multiprocessing (SMP) cache and memory subsystem, the POWER6 chip achieves a significant performance boost compared with its predecessor, the POWER5 chip. Key extensions to the coherence protocol enable POWER6 microprocessor-based systems to achieve better SMP scalability while enabling reductions in system packaging complexity and cost.

symposium on computer arithmetic | 2005

Decimal multiplication with efficient partial product generation

Mark A. Erle; Eric M. Schwarz; Michael J. Schulte

Decimal multiplication is important in many commercial applications including financial analysis, banking, tax calculation, currency conversion, insurance, and accounting. This paper presents a novel design for fixed-point decimal multiplication that utilizes a simple recoding scheme to produce signed-magnitude representations of the operands thereby greatly simplifying the process of generating partial products for each multiplier digit. The partial products are generated using a digit-by-digit multiplier on a word-by-digit basis, first in a signed-digit form with two digits per position, and then combined via a combinational circuit. As the signed-digit partial products are developed one at a time while traversing the recoded multiplier operand from the least significant digit to the most significant digit, each partial product is added along with the accumulated sum of previous partial products via a signed-digit adder. This work is significantly different from other work employing digit-by-digit multipliers due to the efficiency gained by restricting the range of digits throughout the multiplication process.

Ibm Journal of Research and Development | 2007

IBM POWER6 accelerators: VMX and DFU

Lee Evan Eisen; J. W. Ward Iii; H.-W. Tast; N. Mäding; Jens Leenstra; Stefan Mueller; Christian Jacobi; J. Preiss; Eric M. Schwarz; S. R. Carlough

The IBM POWER6™ microprocessor core includes two accelerators for increasing performance of specific workloads. The vector multimedia extension (VMX) provides a vector acceleration of graphic and scientific workloads. It provides single instructions that work on multiple data elements. The instructions separate a 128-bit vector into different components that are operated on concurrently. The decimal floating-point unit (DFU) provides acceleration of commercial workloads, more specifically, financial transactions. It provides a new number system that performs implicit rounding to decimal radix points, a feature essential to monetary transactions. The IBM POWER™ processor instruction set is substantially expanded with the addition of these two accelerators. The VMX architecture contains 176 instructions, while the DFU architecture adds 54 instructions to the base architecture. The IEEE 754R Binary Floating-Point Arithmetic Standard defines decimal floating-point formats, and the POWER6 processor--on which a substantial amount of area has been devoted to increasing performance of both scientific and commercial workloads--is the first commercial hardware implementation of this format.

asilomar conference on signals, systems and computers | 2001

The IBM z900 decimal arithmetic unit

Fadi Y. Busaba; Christopher A. Krygowski; Wen He Li; Eric M. Schwarz; Steven R. Carlough

As the cost for adding functions to a processor continues to decline, processor designs are including many additional features. An example of this trend is the appearance of graphics engines and compression engines on midrange and even low end microprocessors. One area that has the potential to capture chip real estate is the decimal arithmetic engine because of its importance in financial and business applications. Studies show that 55% of the numeric data stored on commercial databases are in decimal format. Although decimal arithmetic is supported in many software languages it is not yet available on many microprocessors. This paper details the decimal arithmetic engine in the recently announced z900 microprocessor.

Ibm Journal of Research and Development | 2009

Decimal floating-point support on the IBM system z10 processor

Eric M. Schwarz; John Kapernick; Mike F. Cowlishaw

The latest IBM zSeries® processor, the IBM System z10™ processor, provides hardware support for the decimal floating-point (DFP) facility that was introduced on the IBM System z9® processor. The z9® processor implements the facility with a mixture of low-level software and hardware assists. Recently, the IBM POWER6™ processor-based System p™ 570 server introduced a hardware implementation of the DFP facility. The latest zSeries processor includes a decimal floating-point unit based on the POWER6 processor DFP unit that has been enhanced to also support the traditional zSeries decimal fixed-point instruction set. This paper explains the hardware implementation to support both decimal fixed point and DFP and the new software support for the DFP facility, including IBM z/OS®, Java™ JIT, and C/C++ compilers, as well as support in IBM DB2® and middleware.

symposium on computer arithmetic | 2007

P6 Binary Floating-Point Unit

Son Dao Trong; Martin S. Schmookler; Eric M. Schwarz; Michael Kroener

The floating point unit of the next generation PowerPC is detailed. It has been tested at over 5 GHz. The design supports an extremely aggressive cycle time of 13 FO4 using a technology independent measure. For most dependent instructions, its fused multiply-add dataflow has only 6 effective pipeline stages. This is nearly equivalent to its predecessor, the Power 5, even though its technology independent frequency has increased over 70%. Overall the frequency has improved over 100%. It achieves this high performance through aggressive feedback paths, circuit design and layout. The pipeline has 7 stages but data may be fed back to dependent operations prior to rounding and complete normalization. Division and square root algorithms are also described which take advantage of high-precision linear approximation hardware for obtaining a reciprocal or reciprocal square root approximation.

IEEE Transactions on Computers | 1989

A general proof for overlapped multiple-bit scanning multiplications

Stamatis Vassiliadis; Eric M. Schwarz; Don J. Hanrahan

Because of recent advances in technology, multibit scanning implementations can be considered that exceed three-bit and four-bit groupings. The generalized proof for the multibit overlapped scanning multiplication is introduced, and the multiplication process is discussed. The proofs are intended to establish the correctness of the decode and the actions taken to produce the multiplication of any valid scheme proposed in the past, and to dictate the correct decode and actions taken for any overlapped s-bit scanning algorithm such that s is a natural number greater than or equal to two. The multiplication is considered to be between two fractional numbers, which are represented in twos-complement form. >

IEEE Transactions on Computers | 1991

Hard-wired multipliers with encoded partial products

Stamatis Vassiliadis; Eric M. Schwarz; Baik Moon Sung

A multibit overlapped scanning multiplication algorithm for sign-magnitude and twos complement hard-wired multipliers is presented. The theorems necessary to construct the multiplication matrix for sign-magnitude representations are emphasized. Consequently, the algorithm for sign-magnitude multiplication and its variation to include twos complement numbers are presented. The proposed algorithm is compared to previous algorithms that generate a sign extended partial product matrix, with an implementation and with a study of the number of elements in the partial product matrix. The proposed algorithm is shown to yield significant savings over well known algorithms for the generation and the reduction of the partial product matrix of a multiplier designed with multibit overlapped scanning. >

IEEE Transactions on Computers | 2005

FPU implementations with denormalized numbers

Eric M. Schwarz; Martin S. Schmookler; Son Dao Trong

Denormalized numbers are the most difficult type of numbers to implement in floating-point units. They are so complex that certain designs have elected to handle them in software rather than in hardware. Traps to software can result in long execution times, which renders denormalized numbers useless to programmers. This does not have to happen. With a small amount of additional hardware, denormalized numbers and underflows can be handled close to the speed of normalized numbers. This paper summarizes the little known techniques for handling denormalized numbers. Most of the techniques described here only appear in filed or pending patent applications.

Explore More