Scott R. Cottier
IBM
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Scott R. Cottier.
IEEE Journal of Solid-state Circuits | 2006
Hwa Joon Oh; Silvia Melitta Mueller; Christian Jacobi; Kevin D. Tran; Scott R. Cottier; Brad W. Michael; Hiroo Nishikawa; Yonetaro Totsuka; Tatsuya Namatame; Naoka Yano; Takashi Machida; Sang Hoo Dhong
The floating-point unit (FPU) in the synergistic processor element (SPE) of a CELL processor is a fully pipelined 4-way single-instruction multiple-data (SIMD) unit designed to accelerate media and data streaming with 128-bit operands. It supports 32-bit single-precision floating-point and 16-bit integer operands with two different latencies, six-cycle and seven-cycle, with 11 FO4 delay per stage. The FPU optimizes the performance of critical single-precision multiply-add operations. Since exact rounding, exceptions, and de-norm number handling are not important to multimedia applications, IEEE correctness on the single-precision floating-point numbers is sacrificed for performance and simple design. It employs fine-grained clock gating for power saving. The design has 768K transistors in 1.3 mm/sup 2/, fabricated SOI in 90-nm technology. Correct operations have been observed up to 5.6 GHz with 1.4 V and 56/spl deg/C, delivering 44.8 GFlops. Architecture, logic, circuits, and integration are codesigned to meet the performance, power, and area goals.
symposium on computer arithmetic | 2005
Silvia Melitta Mueller; Christian Jacobi; Hwa-Joon Oh; Kevin D. Tran; Scott R. Cottier; Brad W. Michael; Hiroo Nishikawa; Yonetaro Totsuka; Tatsuya Namatame; Naoka Yano; Takashi Machida; Sang Hoo Dhong
The floating-point unit in the synergistic processor element of the 1st generation multi-core CELL processor is described. The FPU supports 4-way SIMD single precision and integer operations and 2-way SIMD double precision operations. The design required a high-frequency, low latency, power and area efficiency with primary application to the multimedia streaming workloads, such as 3D graphics. The FPU has 3 different latencies, optimizing the performance critical single precision FMA operations, which are executed with a 6-cycle latency at an 11FO4 cycle time. The latency includes the global forwarding of the result. These challenging performance, power, and area goals were achieved through the co-design of architecture and implementation with optimizations at all levels of the design. This paper focuses on the logical and algorithmic aspects of the FPU we developed, to achieve these goals.
international solid-state circuits conference | 2007
Jürgen Pille; Chad Adams; T. Christensen; Scott R. Cottier; Sebastian Ehrenreich; T. Kono; D. Nelson; Osamu Takahashi; Shunsako Tokito; Otto Torreiter; Otto Wagner; Dieter Wendel
The 65nm CELL Broadband Enginetrade design features a dual power supply, which enhances SRAM stability and performance using an elevated array-specific power supply, while reducing the logic power consumption. Hardware measurements demonstrate low-voltage operation and reduced scatter of the minimum operating voltage. The chip operates at 6GHz at 1.3V and is fabricated in a 65nm CMOS SOI technology.
symposium on vlsi circuits | 2005
Hwa Joon Oh; Silvia Melitta Mueller; Christian Jacobi; Kevin D. Tran; Scott R. Cottier; Brad W. Michael; Hiroo Nishikawa; Yonetaro Totsuka; Tatsuya Namatame; Naoka Yano; Takashi Machida; Sang Hoo Dhong
The floating point unit in the synergistic processor element of a CELL processor is a fully-pipelined 4-way SIMD unit designed to accelerate media and data streaming. It supports 32-bit single-precision floating point and 16-bit integer operands with two different latencies, optimizing the performance of critical single-precision multiply-add operations. It employs fine-grained clock gating for power saving. Architecture, logic, circuits and integration are co-designed to meet the performance, power, and area goals.
international solid-state circuits conference | 2008
Osamu Takahashi; Chad Adams; D. Ault; Erwin Behnen; O. Chiang; Scott R. Cottier; Paula Kristine Coulman; James A. Culp; Gilles Gervais; Michael S. Gray; Y. Itaka; C. J. Johnson; Fumihiro Kono; L. Maurice; Kevin W. McCullen; Lam M. Nguyen; Yoichi Nishino; Hiromi Noro; Jürgen Pille; Mack W. Riley; M. Shen; Chiaki Takano; Shunsako Tokito; Tina Wagner; Hiroshi Yoshihara
This paper describe the challenges of migrating the Cell Broadband Engine (Cell BE) design from a 65 nm SOI to a 45 nm twin-well CMOS technology on SOI with low-k dielectrics and copper metal layers using a mostly automated approach. A die micrograph of the 45 nm Cell BE is described here. The cycle-by-cycle machine behavior is preserved. The focuses are automated migration, power reduction, area reduction, and DFM improvements. The chip power is reduced by roughly 40% and the chip area is reduced by 34%.
symposium on vlsi circuits | 2005
Osamu Takahashi; R. Cook; Scott R. Cottier; Sang Hoo Dhong; Brian Flachs; Koji Hirairi; Atsushi Kawasumi; H. Murakami; H. Noro; H. Oh; S. Onishi; Juergen Pille; Joel Abraham Silberman; S. Yong
A 32b 4-way SIMD dual-issue synergistic processor element of a CELL processor is developed with 20.9 million transistors in 14.8mm/sub 2/ using a 90nm SOI technology. CMOS static gates implement the majority of the logic. Dynamic circuits are used in critical areas, occupying 19% of the non-SRAM area. ISA, microarchitecture, and physical implementation are tightly coupled to achieve a compact and power efficient design. Correct operation has been observed up to 5.6GHz at 1.4V supply and 56/spl deg/C.
international symposium on microarchitecture | 2005
Osamu Takahashi; Scott R. Cottier; Sang Hoo Dhong; Brian Flachs; Joel Abraham Silberman
The authors describe the low-power design of the synergistic processor element (SPE) of the cell processor developed by Sony, Toshiba and IBM. CMOS static gates implement most of the logic, and dynamic circuits are used in critical areas. Tight coupling of the instruction set architecture, microarchitecture, and physical implementation achieves a compact, power-efficient design.
IEEE Journal of Solid-state Circuits | 2008
Juergen Pille; Chad Adams; Todd Alan Christensen; Scott R. Cottier; Sebastian Ehrenreich; Fumihiro Kono; Daniel Mark Nelson; Osamu Takahashi; Shunsako Tokito; Otto Torreiter; Otto Wagner; Dieter Wendel
The 65 nm cell broadband enginetrade (cell BE) is a multi-core SoC, implemented in a high performance SOI technology featuring a separate dual power supply for SRAM arrays to improve stability and performance using an elevated voltage. A new method is shown to analyze the SRAM cell under application conditions which was used to tune the cell for stability, write-ability and performance. An improved write scheme is shown which widens the overall functional window and allows setting the power/performance point of the arrays independently of the surrounding logic. Hardware measurements demonstrate the advantages of the dual power supply under different aspects.
international symposium on microarchitecture | 2005
Toru Asano; Joel Abraham Silberman; Sang Hoo Dhong; Osamu Takahashi; Michael Wayne White; Scott R. Cottier; Takaaki Nakazato; Atsushi Kawasumi; Hiroshi Yoshihara
The synergistic processor element is a new architecture oriented for multimedia and streaming processing. In this architecture, the memory is not a cache but a private or scratch pad memory. Such a memory is simple and needs to be high-frequency and large space in low-power. This design uses an 11 fan-out of four (11FO4), six-cycle, fully pipelined, embedded 256-Kbyte SRAM for this purpose. The designs memory is not one hard macro, but a group of custom macros physically distributed to optimize the pipeline.
international conference on computer aided design | 2005
Osamu Takahashi; Russ Cook; Scott R. Cottier; Sang Hoo Dhong; Brian Flachs; Koji Hirairi; Atsushi Kawasumi; Hiroaki Murakami; Hiromi Noro; Hwa-Joon Oh; S. Onish; Juergen Pille; Joel Abraham Silberman
A 32b 4-way SIMD dual-issue synergistic processor element of a CELL processor is developed with 20.9 million transistors in 14.8mm/sup 2/ using a 90nm SOI technology. CMOS static gates implement the majority of the logic. Dynamic circuits are used in critical areas, occupying 19% of the nonSRAM area. ISA, microarchitecture and physical implementation are tightly coupled to achieve a compact and power efficient design. Correct operation has been observed up to 5.6GHz at 1.4V supply and 56/spl deg/C.