Leon J. Sigal | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Leon J. Sigal is active.

Explore More

Publication

Featured researches published by Leon J. Sigal.

symposium on computer arithmetic | 1997

A radix-8 CMOS S/390 multiplier

Eric M. Schwarz; Robert M. Averill; Leon J. Sigal

The multiplier of a S/390 CMOS microprocessor is described. It is implemented in an aggressive static CMOS technology with a 0.20-/spl mu/m effective channel length. The multiplier has been demonstrated in a single-image shared-memory multiprocessor at frequencies up to 400 MHz. The multiplier requires three machine cycles for a total latency of 7.5 ns, though the design can support a latency of 4.0 ns if the latches are removed. The design goal was to implement a versatile S/980 multiplier with reasonable performance at a very aggressive cycle time. The multiplier implements a radix-8 Booth algorithm and is capable of supporting S/390 floating-point and fixed-point multiplications, and also divisions and square roots. Logic design and physical design issues are discussed relating to the Booth decoding and counter tree implementations.

international solid-state circuits conference | 2011

A 5.2GHz microprocessor chip for the IBM zEnterprise™ system

James D. Warnock; Yuen Chan; William V. Huott; Sean M. Carey; Michael Fee; Huajun Wen; M. J. Saccamango; Frank Malgioglio; Patrick J. Meaney; Donald W. Plass; Yuen H. Chan; Mark D. Mayo; Guenter Mayer; Leon J. Sigal; David L. Rude; Robert M. Averill; Michael H. Wood; Thomas Strach; Howard H. Smith; Brian W. Curran; Eric M. Schwarz; Lee Evan Eisen; Doug Malone; Steve Weitzel; Pak-Kin Mak; Thomas J. McPherson; Charles F. Webb

The microprocessor chip for the IBM zEnterprise 196 (z 196) system is a high-frequency, high-performance design that adds support for out-of-order instruction execution and increases operating frequency by almost 20% compared to the previous 65nm design, while still fitting within the same power envelope. Despite the many difficult engineering hurdles to be overcome, the design team was able to achieve a product frequency of 5.2GHz, providing a significant performance boost for the new system.

international solid-state circuits conference | 2010

POWER7 TM local clocking and clocked storage elements

James D. Warnock; Leon J. Sigal; Dieter Wendel; K. Paul Muller; Joshua Friedrich; Victor Zyuban; Ethan H. Cannon; Aj Kleinosowski

The design of the clocked storage elements (CSEs) and associated local clocking circuitry is a critical consideration for modern microprocessor projects[1], and the POWER7™ chip[2], designed in a 45nm silicon-on-insulator (SOI) technology, was no exception. The digital logic contained over 2M CSEs, and the design of these elements had a major impact not only on the area, power, and performance of the chip, but also on the reliability, testability, and the ability to debug and optimize the hardware. This paper will focus on the special features added to the CSE design with these considerations in mind.

Ibm Journal of Research and Development | 1997

Circuit design techniques for the high-performance CMOS IBM S/390 parallel enterprise server G4 microprocessor

Leon J. Sigal; James D. Warnock; Brian W. Curran; Yuen H. Chan; Peter J. Camporese; Mark D. Mayo; William V. Huott; Daniel R. Knebel; C.T. Chuang; James P. Eckhardt; Philip T. Wu

This paper describes the circuit design techniques used for the IBM S/390® Parallel Enterprise Server G4 microprocessor to achieve operation up to 400 MHz. A judicious choice of process technology and concurrent top-down and bottom-up design approaches reduced risk and shortened the design time. The use of timing-driven synthesis/placement methodologies improved design turnaround time and chip timing. The combined use of static, dynamic, and self-resetting CMOS (SRCMOS) circuits facilitated the balancing of design time and performance return. The use of robust PLL design, floorplanning, and clock distribution minimized clock skew. Innovative latch designs permitted performance optimization without adding risk. Microarchitecture optimization and circuit innovations improved the performance of timing-critical macros. Full custom array design with extensive use of SRCMOS circuit techniques resulted in an on-chip L1 cache having 2.0-ns cycle time.

international solid-state circuits conference | 2000

760 MHz G6 S/390 microprocessor exploiting multiple Vt and copper interconnects

Thomas J. McPherson; Robert M. Averill; D. Balazich; K. Barkley; Sean M. Carey; Yuen H. Chan; R. Crea; A. Dansky; R. Dwyer; A. Haen; D. Hoffman; A. Jatkowski; Mark D. Mayo; D. Merrill; T. McNamara; Gregory A. Northrop; J. Rawlins; Leon J. Sigal; T. Slegel; D. Webber; P. Williams; F. Yee

The G6 system is a sixth generation CMOS server for the S/390 line of products featuring a 12+2 SMP size and significant frequency improvements obtained through the use of low-Vt devices and copper interconnects. The microprocessor operates at 760 MHz at the fast end of the process distribution. The system ships at 637 MHz in a 12+2 chilled SMP configuration. Measured system performance on the 12 way is 1600 S/390 MIPs, providing over 50% more performance than the G5. This microprocessor uses CMOS7S technology, which has a 0.2 /spl mu/m process. The chip uses 6 levels of copper metal plus an additional layer of local interconnect on a 14.6/spl times/14.7 mm/sup 2/ die with 25M transistors (7M logic/18M array). The power supply is 1.9 V and the chip power is 33 W at 637 MHz.

international solid-state circuits conference | 2006

4GHz+ low-latency fixed-point and binary floating-point execution units for the POWER6 processor

Brian W. Curran; B. McCredie; Leon J. Sigal; Eric M. Schwarz; Bruce M. Fleischer; Yuen H. Chan; D. Webber; M. Vaden; A. Goyal

A 1-pipe stage, low-latency, 13 FO4, 64b fixed-point execution unit, implemented in a 65nm SOI CMOS process, allows back-to-back execution of data dependent adds, subtracts, compares, shifts, rotates, and logical operations. A 7-pipe stage, 91 FO4, double-precision floating-point unit allows forwarding of dependent results after 6 cycles in most cases

Ibm Journal of Research and Development | 1997

CMOS floating-point unit for the S/390 parallel enterprise server G4

Eric M. Schwarz; Leon J. Sigal; Thomas J. McPherson

The S/390® floating-point unit (FPU) on the fourth-generation (G4) CMOS microprocessor chip has been implemented in a CMOS technology with a 0.20-µm effective channel length and has been demonstrated at more than 400 MHz. The microprocessor chip is 17.35 by 17.30 mm in size, and one copy of the FPU including the dataflow and control flow but not including the FPR register file is 5.3 by 4.7 mm in size. There are two copies on the chip for error-detection purposes only; both copies execute the same instruction stream and are checked against each other. The high-performance implementation has a throughput of one instruction per cycle and an average latency of three execution cycles, yielding approximately 70 MFLOPS at 300 MHz on the Linpack benchmark. Currently, the G4 FPU is the highest-performance S/390 CMOS FPU with fault tolerance. It uses several innovative and high-performance algorithms not commonly found in S/390 FPUs or other FPUs, such as a radix-8 Booth multiplier, a Goldschmidt division and square-root algorithm, techniques for updating the exponent in parallel with normalization, and avoidance of the remainder comparison in quadratically converging division and square-root algorithms. Also demonstrated is a practical design technique for designing control flow into the dataflow and early floorplanning techniques.

IEEE Journal of Solid-state Circuits | 2014

Circuit and Physical Design of the zEnterprise™ EC12 Microprocessor Chips and Multi-Chip Module

James D. Warnock; Yuen H. Chan; Hubert Harrer; Sean M. Carey; Gerard M. Salem; Doug Malone; Ruchir Puri; Jeffrey A. Zitz; Adam R. Jatkowski; Gerald Strevig; Ayan Datta; Anne E. Gattiker; Aditya Bansal; Guenter Mayer; Yiu-Hing Chan; Mark D. Mayo; David L. Rude; Leon J. Sigal; Thomas Strach; Howard H. Smith; Huajun Wen; Pak-Kin Mak; Chung-Lung Kevin Shum; Donald W. Plass; Charles F. Webb

This work describes the circuit and physical design implementation of the processor chip (CP), level-4 cache chip (SC), and the multi-chip module at the heart of the EC12 system. The chips were implemented in IBMs high-performance 32nm high-k/metal-gate SOI technology. The CP chip contains 6 super-scalar, out-of-order processor cores, running at 5.5 GHz, while the SC chip contains 192 MB of eDRAM cache. Six CP chips and two SC chips are mounted on a high-performance glass-ceramic substrate, which provides high-bandwidth, low-latency interconnections. Various aspects of the design are explored in detail, with most of the focus on the CP chip, including the circuit design implementation, clocking, thermal modeling, reliability, frequency tuning, and comparison to the previous design in 45nm technology.

international solid-state circuits conference | 2015

4.1 22nm Next-generation IBM System z microprocessor

James D. Warnock; Brian W. Curran; John Badar; Gregory J. Fredeman; Donald W. Plass; Yuen H. Chan; Sean M. Carey; Gerard M. Salem; Friedrich Schroeder; Frank Malgioglio; Guenter Mayer; Christopher J. Berry; Michael H. Wood; Yiu-Hing Chan; Mark D. Mayo; John Mack Isakson; Charudhattan Nagarajan; Tobias Werner; Leon J. Sigal; Ricardo H. Nigaglioni; Mark Cichanowski; Jeffrey A. Zitz; Matthew M. Ziegler; Tim Bronson; Gerald Strevig; Daniel M. Dreps; Ruchir Puri; Douglas J. Malone; Dieter Wendel; Pak-Kin Mak

The next-generation System z design introduces a new microprocessor chip (CP) and a system controller chip (SC) aimed at providing a substantial boost to maximum system capacity and performance compared to the previous zEC12 design in 32nm [1,2]. As shown in the die photo, the CP chip includes 8 high-frequency processor cores, 64MB of eDRAM L3 cache, interface IOs (“XBUS”) to connect to two other processor chips and the L4 cache chip, along with memory interfaces, 2 PCIe Gen3 interfaces, and an I/O bus controller (GX). The design is implemented on a 678 mm2 die with 4.0 billion transistors and 17 levels of metal interconnect in IBMs high-performance 22nm high-x CMOS SOI technology [3]. The SC chip is also a 678 mm2 die, with 7.1 billion transistors, running at half the clock frequency of the CP chip, in the same 22nm technology, but with 15 levels of metal. It provides 480 MB of eDRAM L4 cache, an increase of more than 2× from zEC12 [1,2], and contains an 18 MB eDRAM L4 directory, along with multi-processor cache control/coherency logic to manage inter-processor and system-level communications. Both the CP and SC chips incorporate significant logical, physical, and electrical design innovations.

Ibm Journal of Research and Development | 2007

Power-constrained high-frequency circuits for the IBM POWER6 microprocessor

Brian W. Curran; Eric Fluhr; Jose Angel Paredes; Leon J. Sigal; Joshua Friedrich; Yiu-Hing Chan; Charlie Hwang

The IBM POWER6™ microprocessor is a high-frequency (>5-G Hz) microprocessor fabricated in the IBM 65-nm silicon-on-insulator (SOI) complementary metal-oxide semiconductor (CMOS) process technology. This paper describes the circuit, physical design, clocking, timing, power, and hardware characterization challenges faced in the pursuit of this industry-leading frequency. Traditional high-power, high-frequency techniques were abandoned in favor of more-power-efficient circuit design methodologies. The hardware frequency and power characterization are reviewed.

Explore More