Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Scott R. Cottier is active.

Publication


Featured researches published by Scott R. Cottier.


IEEE Journal of Solid-state Circuits | 2006

A fully pipelined single-precision floating-point unit in the synergistic processor element of a CELL processor

Hwa Joon Oh; Silvia Melitta Mueller; Christian Jacobi; Kevin D. Tran; Scott R. Cottier; Brad W. Michael; Hiroo Nishikawa; Yonetaro Totsuka; Tatsuya Namatame; Naoka Yano; Takashi Machida; Sang Hoo Dhong

The floating-point unit (FPU) in the synergistic processor element (SPE) of a CELL processor is a fully pipelined 4-way single-instruction multiple-data (SIMD) unit designed to accelerate media and data streaming with 128-bit operands. It supports 32-bit single-precision floating-point and 16-bit integer operands with two different latencies, six-cycle and seven-cycle, with 11 FO4 delay per stage. The FPU optimizes the performance of critical single-precision multiply-add operations. Since exact rounding, exceptions, and de-norm number handling are not important to multimedia applications, IEEE correctness on the single-precision floating-point numbers is sacrificed for performance and simple design. It employs fine-grained clock gating for power saving. The design has 768K transistors in 1.3 mm/sup 2/, fabricated SOI in 90-nm technology. Correct operations have been observed up to 5.6 GHz with 1.4 V and 56/spl deg/C, delivering 44.8 GFlops. Architecture, logic, circuits, and integration are codesigned to meet the performance, power, and area goals.


symposium on computer arithmetic | 2005

The vector floating-point unit in a synergistic processor element of a CELL processor

Silvia Melitta Mueller; Christian Jacobi; Hwa-Joon Oh; Kevin D. Tran; Scott R. Cottier; Brad W. Michael; Hiroo Nishikawa; Yonetaro Totsuka; Tatsuya Namatame; Naoka Yano; Takashi Machida; Sang Hoo Dhong

The floating-point unit in the synergistic processor element of the 1st generation multi-core CELL processor is described. The FPU supports 4-way SIMD single precision and integer operations and 2-way SIMD double precision operations. The design required a high-frequency, low latency, power and area efficiency with primary application to the multimedia streaming workloads, such as 3D graphics. The FPU has 3 different latencies, optimizing the performance critical single precision FMA operations, which are executed with a 6-cycle latency at an 11FO4 cycle time. The latency includes the global forwarding of the result. These challenging performance, power, and area goals were achieved through the co-design of architecture and implementation with optimizations at all levels of the design. This paper focuses on the logical and algorithmic aspects of the FPU we developed, to achieve these goals.


international solid-state circuits conference | 2007

Implementation of the CELL Broadband Engine in a 65nm SOI Technology Featuring Dual-Supply SRAM Arrays Supporting 6GHz at 1.3V

Jürgen Pille; Chad Adams; T. Christensen; Scott R. Cottier; Sebastian Ehrenreich; T. Kono; D. Nelson; Osamu Takahashi; Shunsako Tokito; Otto Torreiter; Otto Wagner; Dieter Wendel

The 65nm CELL Broadband Enginetrade design features a dual power supply, which enhances SRAM stability and performance using an elevated array-specific power supply, while reducing the logic power consumption. Hardware measurements demonstrate low-voltage operation and reduced scatter of the minimum operating voltage. The chip operates at 6GHz at 1.3V and is fabricated in a 65nm CMOS SOI technology.


symposium on vlsi circuits | 2005

A fully-pipelined single-precision floating point unit in the synergistic processor element of a CELL processor

Hwa Joon Oh; Silvia Melitta Mueller; Christian Jacobi; Kevin D. Tran; Scott R. Cottier; Brad W. Michael; Hiroo Nishikawa; Yonetaro Totsuka; Tatsuya Namatame; Naoka Yano; Takashi Machida; Sang Hoo Dhong

The floating point unit in the synergistic processor element of a CELL processor is a fully-pipelined 4-way SIMD unit designed to accelerate media and data streaming. It supports 32-bit single-precision floating point and 16-bit integer operands with two different latencies, optimizing the performance of critical single-precision multiply-add operations. It employs fine-grained clock gating for power saving. Architecture, logic, circuits and integration are co-designed to meet the performance, power, and area goals.


international solid-state circuits conference | 2008

Migration of Cell Broadband Engine from 65nm SOI to 45nm SOI

Osamu Takahashi; Chad Adams; D. Ault; Erwin Behnen; O. Chiang; Scott R. Cottier; Paula Kristine Coulman; James A. Culp; Gilles Gervais; Michael S. Gray; Y. Itaka; C. J. Johnson; Fumihiro Kono; L. Maurice; Kevin W. McCullen; Lam M. Nguyen; Yoichi Nishino; Hiromi Noro; Jürgen Pille; Mack W. Riley; M. Shen; Chiaki Takano; Shunsako Tokito; Tina Wagner; Hiroshi Yoshihara

This paper describe the challenges of migrating the Cell Broadband Engine (Cell BE) design from a 65 nm SOI to a 45 nm twin-well CMOS technology on SOI with low-k dielectrics and copper metal layers using a mostly automated approach. A die micrograph of the 45 nm Cell BE is described here. The cycle-by-cycle machine behavior is preserved. The focuses are automated migration, power reduction, area reduction, and DFM improvements. The chip power is reduced by roughly 40% and the chip area is reduced by 34%.


symposium on vlsi circuits | 2005

The circuits and physical design of the synergistic processor element of a CELL processor

Osamu Takahashi; R. Cook; Scott R. Cottier; Sang Hoo Dhong; Brian Flachs; Koji Hirairi; Atsushi Kawasumi; H. Murakami; H. Noro; H. Oh; S. Onishi; Juergen Pille; Joel Abraham Silberman; S. Yong

A 32b 4-way SIMD dual-issue synergistic processor element of a CELL processor is developed with 20.9 million transistors in 14.8mm/sub 2/ using a 90nm SOI technology. CMOS static gates implement the majority of the logic. Dynamic circuits are used in critical areas, occupying 19% of the non-SRAM area. ISA, microarchitecture, and physical implementation are tightly coupled to achieve a compact and power efficient design. Correct operation has been observed up to 5.6GHz at 1.4V supply and 56/spl deg/C.


international symposium on microarchitecture | 2005

Power-conscious design of the Cell processor's synergistic processor element

Osamu Takahashi; Scott R. Cottier; Sang Hoo Dhong; Brian Flachs; Joel Abraham Silberman

The authors describe the low-power design of the synergistic processor element (SPE) of the cell processor developed by Sony, Toshiba and IBM. CMOS static gates implement most of the logic, and dynamic circuits are used in critical areas. Tight coupling of the instruction set architecture, microarchitecture, and physical implementation achieves a compact, power-efficient design.


IEEE Journal of Solid-state Circuits | 2008

Implementation of the Cell Broadband Engine™ in 65 nm SOI Technology Featuring Dual Power Supply SRAM Arrays Supporting 6 GHz at 1.3 V

Juergen Pille; Chad Adams; Todd Alan Christensen; Scott R. Cottier; Sebastian Ehrenreich; Fumihiro Kono; Daniel Mark Nelson; Osamu Takahashi; Shunsako Tokito; Otto Torreiter; Otto Wagner; Dieter Wendel

The 65 nm cell broadband enginetrade (cell BE) is a multi-core SoC, implemented in a high performance SOI technology featuring a separate dual power supply for SRAM arrays to improve stability and performance using an elevated voltage. A new method is shown to analyze the SRAM cell under application conditions which was used to tune the cell for stability, write-ability and performance. An improved write scheme is shown which widens the overall functional window and allows setting the power/performance point of the arrays independently of the surrounding logic. Hardware measurements demonstrate the advantages of the dual power supply under different aspects.


international symposium on microarchitecture | 2005

Low-power design approach of 11FO4 256-Kbyte embedded SRAM for the synergistic processor element of a Cell processor

Toru Asano; Joel Abraham Silberman; Sang Hoo Dhong; Osamu Takahashi; Michael Wayne White; Scott R. Cottier; Takaaki Nakazato; Atsushi Kawasumi; Hiroshi Yoshihara

The synergistic processor element is a new architecture oriented for multimedia and streaming processing. In this architecture, the memory is not a cache but a private or scratch pad memory. Such a memory is simple and needs to be high-frequency and large space in low-power. This design uses an 11 fan-out of four (11FO4), six-cycle, fully pipelined, embedded 256-Kbyte SRAM for this purpose. The designs memory is not one hard macro, but a group of custom macros physically distributed to optimize the pipeline.


international conference on computer aided design | 2005

The circuit design of the synergistic processor element of a CELL processor

Osamu Takahashi; Russ Cook; Scott R. Cottier; Sang Hoo Dhong; Brian Flachs; Koji Hirairi; Atsushi Kawasumi; Hiroaki Murakami; Hiromi Noro; Hwa-Joon Oh; S. Onish; Juergen Pille; Joel Abraham Silberman

A 32b 4-way SIMD dual-issue synergistic processor element of a CELL processor is developed with 20.9 million transistors in 14.8mm/sup 2/ using a 90nm SOI technology. CMOS static gates implement the majority of the logic. Dynamic circuits are used in critical areas, occupying 19% of the nonSRAM area. ISA, microarchitecture and physical implementation are tightly coupled to achieve a compact and power efficient design. Correct operation has been observed up to 5.6GHz at 1.4V supply and 56/spl deg/C.

Researchain Logo
Decentralizing Knowledge