Philip T. Wu | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Philip T. Wu is active.

Explore More

Publication

Featured researches published by Philip T. Wu.

Ibm Journal of Research and Development | 2005

Design and exploitation of a high-performance SIMD floating-point unit for Blue Gene/L

Siddhartha Chatterjee; L. R. Bachega; Peter Bergner; K. A. Dockser; John A. Gunnels; Manish Gupta; Fred G. Gustavson; Christopher A. Lapkowski; G. K. Liu; Mark P. Mendell; Ravi Nair; C. D. Wait; T. J. C. Ward; Philip T. Wu

We describe the design of a dual-issue single-instruction, multiple-data-like (SIMD-like) extension of the IBM PowerPC® 440 floating-point unit (FPU) core and the compiler and algorithmic techniques to exploit it. This extended FPU is targeted at both the IBM massively parallel Blue Gene®/L machine and the more pervasive embedded platforms. We discuss the hardware and software codesign that was essential in order to fully realize the performance benefits of the FPU when constrained by the memory bandwidth limitations and high penalties for misaligned data access imposed by the memory hierarchy on a Blue Gene/L node. Using both hand-optimized and compiled code for key linear algebraic kernels, we validate the architectural design choices, evaluate the success of the compiler, and quantify the effectiveness of the novel algorithm design techniques. Our measurements show that the combination of algorithm, compiler, and hardware delivers a significant fraction of peak floating-point performance for compute-bound-kernels, such as matrix multiplication, and delivers a significant fraction of peak memory bandwidth for memorybound kernels, such as DAXPY, while remaining largely insensitive to data alignment.

Ibm Journal of Research and Development | 1997

Circuit design techniques for the high-performance CMOS IBM S/390 parallel enterprise server G4 microprocessor

Leon J. Sigal; James D. Warnock; Brian W. Curran; Yuen H. Chan; Peter J. Camporese; Mark D. Mayo; William V. Huott; Daniel R. Knebel; C.T. Chuang; James P. Eckhardt; Philip T. Wu

This paper describes the circuit design techniques used for the IBM S/390® Parallel Enterprise Server G4 microprocessor to achieve operation up to 400 MHz. A judicious choice of process technology and concurrent top-down and bottom-up design approaches reduced risk and shortened the design time. The use of timing-driven synthesis/placement methodologies improved design turnaround time and chip timing. The combined use of static, dynamic, and self-resetting CMOS (SRCMOS) circuits facilitated the balancing of design time and performance return. The use of robust PLL design, floorplanning, and clock distribution minimized clock skew. Innovative latch designs permitted performance optimization without adding risk. Microarchitecture optimization and circuit innovations improved the performance of timing-critical macros. Full custom array design with extensive use of SRCMOS circuit techniques resulted in an on-chip L1 cache having 2.0-ns cycle time.

Ibm Journal of Research and Development | 2002

IBM eServer z900 high-frequency microprocessor technology, circuits, and design methodology

Brian W. Curran; Yuen H. Chan; Philip T. Wu; Peter J. Camporese; Gregory A. Northrop; Robert F. Hatch; Lisa B. Lacey; James P. Eckhardt; David T. Hui; Howard H. Smith

The IBM eServer z900 microprocessor is a seventh-generation zSeries™ (formerly S/390®) CMOS design which has achieved 1.3-GHz operation. This paper describes the 0.18-µm bulk CMOS, seven-level copper metal process and the high-frequency circuit, integration, and design methodologies developed to achieve this operation. The microprocessor was floorplanned to closely mimic the flow of the microarchitecture pipeline and reduce the communication delay overhead between units. Novel circuit techniques were used in the implementation of the arrays and cache hit detection logic to save power and reduce circuit complexity without sacrificing performance. A four-dimensional gate library and novel synthesis algorithms were developed to yield synthesized control implementations with the performance characteristics of a fully custom circuit design.

Microelectronics Reliability | 2005

Characterization of a 0.13 μm CMOS Link Chip using Time Resolved Emission (TRE)

Franco Stellari; Peilin Song; John Nicholas Hryckowian; Otto Torreiter; Steve Wilson; Philip T. Wu; Alberto Tosi

The Picosecond Imaging Circuit Analysis (PICA) technique using the Superconducting Single-Photon Detector (SSPD) allows the detailed characterization of pulse width variations along the delay chain of a high speed Self Timing Interface (STI). Pulses gradually shrink and finally disappear along the delay chain.

vlsi test symposium | 1993

Design SRAMs for burn-in

William Robert Reohr; Yuen H. Chan; Donald W. Plass; Antonio R. Pelella; Philip T. Wu

SRAM designers and product engineers must balance the diverse aspects involved in developing and manufacturing quality ICs. This paper describes how cost and complexity design techniques to improve burn-in, noting implications for performance, power and density.<<ETX>>

Ibm Journal of Research and Development | 2007

Optimization of silicon technology for the IBM system z9

Daniel J. Poindexter; Scott Richard Stiffler; Philip T. Wu; Paul D. Agnello; Thomas H. Ivers; Shreesh Narasimha; Thomas B. Faure; Jed H. Rankin; David A. Grosch; Marc D. Knox; Daniel C. Edelstein; M. Khare; Gary B. Bronner; Hyunjang Nam; Shahid Butt

IBM 90-nm silicon-on-insulator (SOI) technology was used for the key chips in the System z9TM processor chipset. Along with system design, optimization of some critical features of this technology enabled the z9TM to achieve double the system performance of the previous generation. These technology improvements included logic and SRAM FET optimization, mask fabrication, lithography and wafer processing, and interconnect technology. Reliability improvements such as SRAM optimization and burn-in reliability screen are also described.

Archive | 1990