Brad W. Michael | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Brad W. Michael is active.

Explore More

Publication

Featured researches published by Brad W. Michael.

international solid-state circuits conference | 2005

A streaming processing unit for a CELL processor

Brian Flachs; Shigehiro Asano; Sang Hoo Dhong; P. Hotstee; Gilles Gervais; Roy Moonseuk Kim; T. Le; Peichun Liu; Jens Leenstra; John Samuel Liberty; Brad W. Michael; H. Oh; Silvia Melitta Mueller; Osamu Takahashi; A. Hatakeyama; Yukio Watanabe; Naoka Yano

The design of a 4-way SIMD streaming data processor emphasizes achievable performance in area and power. Software controls data movement and instruction flow, and improves data bandwidth and pipeline utilization. The micro-architecture minimizes instruction latency and provides fine-grain clock control to reduce power.

IEEE Journal of Solid-state Circuits | 2006

The microarchitecture of the synergistic processor for a cell processor

Brian Flachs; Shigehiro Asano; Sang Hoo Dhong; Harm Peter Hofstee; Gilles Gervais; Roy Kim; T. Le; Peichun Liu; Jens Leenstra; John Samuel Liberty; Brad W. Michael; Hwa-Joon Oh; Silvia Melitta Mueller; Osamu Takahashi; A. Hatakeyama; Yukio Watanabe; Naoka Yano; Daniel Alan Brokenshire; Mohammad Peyravian; Vandung To; E. Iwata

This paper describes an 11 FO4 streaming data processor in the IBM 90-nm SOI-low-k process. The dual-issue, four-way SIMD processor emphasizes achievable performance per area and power. Software controls most aspects of data movement and instruction flow to improve memory system performance and core performance density. The design minimizes instruction latency while providing for fine grain clock control to reduce power.

IEEE Journal of Solid-state Circuits | 2006

A fully pipelined single-precision floating-point unit in the synergistic processor element of a CELL processor

Hwa Joon Oh; Silvia Melitta Mueller; Christian Jacobi; Kevin D. Tran; Scott R. Cottier; Brad W. Michael; Hiroo Nishikawa; Yonetaro Totsuka; Tatsuya Namatame; Naoka Yano; Takashi Machida; Sang Hoo Dhong

The floating-point unit (FPU) in the synergistic processor element (SPE) of a CELL processor is a fully pipelined 4-way single-instruction multiple-data (SIMD) unit designed to accelerate media and data streaming with 128-bit operands. It supports 32-bit single-precision floating-point and 16-bit integer operands with two different latencies, six-cycle and seven-cycle, with 11 FO4 delay per stage. The FPU optimizes the performance of critical single-precision multiply-add operations. Since exact rounding, exceptions, and de-norm number handling are not important to multimedia applications, IEEE correctness on the single-precision floating-point numbers is sacrificed for performance and simple design. It employs fine-grained clock gating for power saving. The design has 768K transistors in 1.3 mm/sup 2/, fabricated SOI in 90-nm technology. Correct operations have been observed up to 5.6 GHz with 1.4 V and 56/spl deg/C, delivering 44.8 GFlops. Architecture, logic, circuits, and integration are codesigned to meet the performance, power, and area goals.

symposium on computer arithmetic | 2005

The vector floating-point unit in a synergistic processor element of a CELL processor

Silvia Melitta Mueller; Christian Jacobi; Hwa-Joon Oh; Kevin D. Tran; Scott R. Cottier; Brad W. Michael; Hiroo Nishikawa; Yonetaro Totsuka; Tatsuya Namatame; Naoka Yano; Takashi Machida; Sang Hoo Dhong

The floating-point unit in the synergistic processor element of the 1st generation multi-core CELL processor is described. The FPU supports 4-way SIMD single precision and integer operations and 2-way SIMD double precision operations. The design required a high-frequency, low latency, power and area efficiency with primary application to the multimedia streaming workloads, such as 3D graphics. The FPU has 3 different latencies, optimizing the performance critical single precision FMA operations, which are executed with a 6-cycle latency at an 11FO4 cycle time. The latency includes the global forwarding of the result. These challenging performance, power, and area goals were achieved through the co-design of architecture and implementation with optimizations at all levels of the design. This paper focuses on the logical and algorithmic aspects of the FPU we developed, to achieve these goals.

symposium on vlsi circuits | 2005

A fully-pipelined single-precision floating point unit in the synergistic processor element of a CELL processor

The floating point unit in the synergistic processor element of a CELL processor is a fully-pipelined 4-way SIMD unit designed to accelerate media and data streaming. It supports 32-bit single-precision floating point and 16-bit integer operands with two different latencies, optimizing the performance of critical single-precision multiply-add operations. It employs fine-grained clock gating for power saving. Architecture, logic, circuits and integration are co-designed to meet the performance, power, and area goals.

Ibm Journal of Research and Development | 2007

Microarchitecture and implementation of the synergistic processor in 65-nm and 90-nm SOI

Brian Flachs; S. Asano; Sang Hoo Dhong; Harm Peter Hofstee; Gilles Gervais; Roy Moonseuk Kim; T. N. Le; P. Liu; Jens Leenstra; John Samuel Liberty; Brad W. Michael; H.-J. Oh; Stefan Mueller; Osamu Takahashi; K. Hirairi; A. Kawasumii; H. Murakami; H. Noro; S. Onishi; J. Pille; J. Silberman; S. Yong; A. Hatakeyama; Y. Watanabe; Naoka Yano; Daniel Alan Brokenshire; Mohammad Peyravian; V. To; Eiji Iwata

This paper describes the architecture and implementation of the original gaming-oriented synergistic processor element (SPE) in both 90-nm and 65-nm silicon-on-insulator (SOI) technology and introduces a new SPE implementation targeted for the high-performance computing community. The Cell Broadband Engine™ processor contains eight SPEs. The dual-issue, four-way single-instruction multiple-data processor is designed to achieve high performance per area and power and is optimized to process streaming data, simulate physical phenomena, and render objects digitally. Most aspects of data movement and instruction flow are controlled by software to improve the performance of the memory system and the core performance density. The SPE was designed as an 11-F04 (fan-out-of-4-inverter-delay) processor using 20.9 million transistors within 14.8 mm 2 using the IBM 90-nm SOI low-k process. CMOS (complementary metal-oxide semiconductor) static gates implement the majority of the logic. Dynamic circuits are used in critical areas and occupy 19% of the non-static random access memory (SRAM) area. Instruction set architecture, microarchitecture, and physical implementation are tightly coupled to achieve a compact and power-efficient design. Correct operation has been observed at up to 5.6 GHz and 7.3 GHz, respectively, in 90-nm and 65-nm SOI technology.

electrical performance of electronic packaging | 2006

A method to measure impedance of chip/package/board power supply system using pseudo-impulse current

Yaping Zhou; Sang Hoo Dhong; Brian Flachs; Paul M. Harvey; Brad W. Michael

A method to measure the impedance Z(f) of a chip/package/board power supply system using pseudo-impulse current is described. This method can be easily applied to the digital systems with synchronous clocking systems. A PowerPC based microprocessor power supply system is used as an example to show the effectiveness of the method

asia and south pacific design automation conference | 2006

An SPU reference model for simulation, random test generation and verification

Yukio Watanabe; Balazs Sallay; Brad W. Michael; Daniel Alan Brokenshire; Gavin B. Meil; Hazim Shafi; Daisuke Hiraoka

An instruction set level reference model was developed for the development of synergistic processing unit (SPU), which is one of the key components of the cell processor [Pham, 2005][Flachs, 2005]. This reference model was used for the simulators to define the instruction set architecture (ISA), for the random test case generator, for the reference in the verification environment and for the software development. Using the same reference model for multiple purposes made it easier to keep up with the architecture changes at the early stage of the microprocessor development. Also including the reference model in the simulation environment increased the robustness for the random test executions and made it possible to find bugs that are usually difficult to catch.

asian solid state circuits conference | 2005

The Power Conscious Synergistic Processor Element of a Cell Processor

Osamu Takahashi; Scott R. Cottier; Sang Hoo Dhong; Brian Flachs; Koji Hirairi; H. Peter Hofstee; Brad W. Michael; Hiromi Noro; Dieter Wendel; Michael Wayne White

A 4-way SIMD streaming processor of a cell processor is developed in a 90nm SOI technology. CMOS static gates implement the majority of the logic. Dynamic circuits are used in critical areas, occupying 19% of the non-SRAM area. ISA, microarchitecture, and physical implementation are co-optimized to achieve a compact and power efficient design

Archive | 2001