Brian Flachs | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Brian Flachs is active.

Explore More

Publication

Featured researches published by Brian Flachs.

IEEE Micro | 2006

Synergistic Processing in Cell's Multicore Architecture

Michael Karl Gschwind; Harm Peter Hofstee; Brian Flachs; M. Hopkin; Y. Watanabe; T. Yamazaki

Eight synergistic processor units enable the Cell Broadband Engines breakthrough performance. The SPU architecture implements a novel, pervasively data-parallel architecture combining scalar and SIMD processing on a wide data path. A large number of SPUs per chip provide high thread-level parallelism. The streamlined architecture provides an efficient multithreaded execution environment for both scalar and SIMD threads and represents a reaffirmation of the RISC principles of combining leading edge architecture and compiler optimizations. These design decisions have enabled the Cell BE to deliver unprecedented supercomputer-class compute power for consumer applications

international solid-state circuits conference | 2005

A streaming processing unit for a CELL processor

Brian Flachs; Shigehiro Asano; Sang Hoo Dhong; P. Hotstee; Gilles Gervais; Roy Moonseuk Kim; T. Le; Peichun Liu; Jens Leenstra; John Samuel Liberty; Brad W. Michael; H. Oh; Silvia Melitta Mueller; Osamu Takahashi; A. Hatakeyama; Yukio Watanabe; Naoka Yano

The design of a 4-way SIMD streaming data processor emphasizes achievable performance in area and power. Software controls data movement and instruction flow, and improves data bandwidth and pipeline utilization. The micro-architecture minimizes instruction latency and provides fine-grain clock control to reduce power.

IEEE Journal of Solid-state Circuits | 2006

The microarchitecture of the synergistic processor for a cell processor

Brian Flachs; Shigehiro Asano; Sang Hoo Dhong; Harm Peter Hofstee; Gilles Gervais; Roy Kim; T. Le; Peichun Liu; Jens Leenstra; John Samuel Liberty; Brad W. Michael; Hwa-Joon Oh; Silvia Melitta Mueller; Osamu Takahashi; A. Hatakeyama; Yukio Watanabe; Naoka Yano; Daniel Alan Brokenshire; Mohammad Peyravian; Vandung To; E. Iwata

This paper describes an 11 FO4 streaming data processor in the IBM 90-nm SOI-low-k process. The dual-issue, four-way SIMD processor emphasizes achievable performance per area and power. Software controls most aspects of data movement and instruction flow to improve memory system performance and core performance density. The design minimizes instruction latency while providing for fine grain clock control to reduce power.

international solid-state circuits conference | 2008

A Resonant Global Clock Distribution for the Cell Broadband-Engine Processor

Steven Chan; Phillip J. Restle; Thomas J. Bucelot; Steve Weitzel; John M. Keaty; John Samuel Liberty; Brian Flachs; Richard P. Volant; Peter Kapusta; Jeffrey S. Zimmerman

Resonant clock distributions have the potential to save power by recycling energy from cycle-to-cycle while at the same time improving performance by reducing the clock distribution latency and filtering out non-periodic noise. While these features have been successfully demonstrated in several small-scale experiments, there remained a number of concerns about whether these techniques would scale to a product application. By modifying the Cell broadband engine processor to incorporate a large resonant global clock network, power savings with full functionality is demonstrated over a 20% range in clock frequencies, and a 6-8 Watt power savings at 4 GHz. This was achieved by changing one wiring level and adding an additional thick copper level to create inductors and capacitors.

international solid-state circuits conference | 2000

A 1 GHz single-issue 64 b PowerPC processor

Peter Hofstee; Naoaki Aoki; David William Boerstler; Paula Kristine Coulman; Sang Hoo Dhong; Brian Flachs; N. Kojima; O. Kwon; Kyung Tek Lee; David Meltzer; Kevin J. Nowka; J. Park; J. Peter; Stephen D. Posluszny; M. Shapiro; Joel Abraham Silberman; Osamu Takahashi; B. Weinberger

This 64 b single-issue PowerPC processor contains 19M transistors and is fabricated in 0.12 /spl mu/m L/sub eff/ six-layer copper interconnect CMOS. Nominal processor clock frequency is 1.0 GHz. At the fast end of the process distribution the processor reaches 1.15 GHz (1.87 V, 101/spl deg/C, 112 W). As in a previous design, nearly the entire processor is implemented using delayed-reset and self-resetting dynamic circuit macros. New contributions include: (1) a fully pipelined, four execution-stage IEEE double-precision floating-point unit (FPU) with fused multiply-add. 2) Sum-addressed memory management units (MMUs) and 64 kB 2-cycle caches. (3) Support for the full 64 b PowerPC instruction set. (4) Dynamic PLA-based control. (5) A microarchitecture and floorplan that balances critical paths. (6) Delayed-reset dynamic circuits that support stress testing (burn-in). 7) Improved clock generation and distribution.

IEEE Journal of Solid-state Circuits | 2009

A Resonant Global Clock Distribution for the Cell Broadband Engine Processor

Steven C. Chan; Phillip J. Restle; Thomas J. Bucelot; John Samuel Liberty; Stephen Douglas Weitzel; John M. Keaty; Brian Flachs; Richard P. Volant; Peter Kapusta; Jeffrey S. Zimmerman

Resonant clocking techniques show promise in reducing global clock power and timing uncertainty (skew and jitter). By resonating the large global clock capacitance with an inductance, the energy used to charge the clock node each period can be recycled within the LC tank network, resulting in lower clock power. Additional power savings are realized by reducing the strength of clock drivers because only losses need to be overcome at resonance. Skew and jitter are improved due to the bandpass characteristic of the LC network and the use of fewer clock buffering stages. We describe how the Cell Broadband Engine (Cell BE) processor is experimentally transformed to have a resonant-load global clock distribution similar to the one in (Chan et al., 2004).

symposium on vlsi circuits | 2005

The circuits and physical design of the synergistic processor element of a CELL processor

Osamu Takahashi; R. Cook; Scott R. Cottier; Sang Hoo Dhong; Brian Flachs; Koji Hirairi; Atsushi Kawasumi; H. Murakami; H. Noro; H. Oh; S. Onishi; Juergen Pille; Joel Abraham Silberman; S. Yong

A 32b 4-way SIMD dual-issue synergistic processor element of a CELL processor is developed with 20.9 million transistors in 14.8mm/sub 2/ using a 90nm SOI technology. CMOS static gates implement the majority of the logic. Dynamic circuits are used in critical areas, occupying 19% of the non-SRAM area. ISA, microarchitecture, and physical implementation are tightly coupled to achieve a compact and power efficient design. Correct operation has been observed up to 5.6GHz at 1.4V supply and 56/spl deg/C.

international symposium on microarchitecture | 2005

Power-conscious design of the Cell processor's synergistic processor element

Osamu Takahashi; Scott R. Cottier; Sang Hoo Dhong; Brian Flachs; Joel Abraham Silberman

The authors describe the low-power design of the synergistic processor element (SPE) of the cell processor developed by Sony, Toshiba and IBM. CMOS static gates implement most of the logic, and dynamic circuits are used in critical areas. Tight coupling of the instruction set architecture, microarchitecture, and physical implementation achieves a compact, power-efficient design.

Ibm Journal of Research and Development | 2007

Microarchitecture and implementation of the synergistic processor in 65-nm and 90-nm SOI

Brian Flachs; S. Asano; Sang Hoo Dhong; Harm Peter Hofstee; Gilles Gervais; Roy Moonseuk Kim; T. N. Le; P. Liu; Jens Leenstra; John Samuel Liberty; Brad W. Michael; H.-J. Oh; Stefan Mueller; Osamu Takahashi; K. Hirairi; A. Kawasumii; H. Murakami; H. Noro; S. Onishi; J. Pille; J. Silberman; S. Yong; A. Hatakeyama; Y. Watanabe; Naoka Yano; Daniel Alan Brokenshire; Mohammad Peyravian; V. To; Eiji Iwata

This paper describes the architecture and implementation of the original gaming-oriented synergistic processor element (SPE) in both 90-nm and 65-nm silicon-on-insulator (SOI) technology and introduces a new SPE implementation targeted for the high-performance computing community. The Cell Broadband Engine™ processor contains eight SPEs. The dual-issue, four-way single-instruction multiple-data processor is designed to achieve high performance per area and power and is optimized to process streaming data, simulate physical phenomena, and render objects digitally. Most aspects of data movement and instruction flow are controlled by software to improve the performance of the memory system and the core performance density. The SPE was designed as an 11-F04 (fan-out-of-4-inverter-delay) processor using 20.9 million transistors within 14.8 mm 2 using the IBM 90-nm SOI low-k process. CMOS (complementary metal-oxide semiconductor) static gates implement the majority of the logic. Dynamic circuits are used in critical areas and occupy 19% of the non-static random access memory (SRAM) area. Instruction set architecture, microarchitecture, and physical implementation are tightly coupled to achieve a compact and power-efficient design. Correct operation has been observed at up to 5.6 GHz and 7.3 GHz, respectively, in 90-nm and 65-nm SOI technology.

international conference on computer aided design | 2005

The circuit design of the synergistic processor element of a CELL processor

Osamu Takahashi; Russ Cook; Scott R. Cottier; Sang Hoo Dhong; Brian Flachs; Koji Hirairi; Atsushi Kawasumi; Hiroaki Murakami; Hiromi Noro; Hwa-Joon Oh; S. Onish; Juergen Pille; Joel Abraham Silberman

A 32b 4-way SIMD dual-issue synergistic processor element of a CELL processor is developed with 20.9 million transistors in 14.8mm/sup 2/ using a 90nm SOI technology. CMOS static gates implement the majority of the logic. Dynamic circuits are used in critical areas, occupying 19% of the nonSRAM area. ISA, microarchitecture and physical implementation are tightly coupled to achieve a compact and power efficient design. Correct operation has been observed up to 5.6GHz at 1.4V supply and 56/spl deg/C.

Explore More