Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Dieter Wendel is active.

Publication


Featured researches published by Dieter Wendel.


international solid-state circuits conference | 2005

The design and implementation of a first-generation CELL processor

D. Pham; S. Asano; Mark Bolliger; M.N. Day; H.P. Hofstee; Charles Ray Johns; James Allan Kahle; Atsushi Kameyama; John M. Keaty; Y. Masubuchi; Mack W. Riley; D. Shippy; Daniel Lawrence Stasiak; Masakazu Suzuoki; M. Wang; James D. Warnock; Steve Weitzel; Dieter Wendel; T. Yamazaki; Kazuaki Yazawa

A CELL processor is a multi-core chip consisting of a 64b power architecture processor, multiple streaming processors, a flexible IO interface, and a memory interface controller. This SoC is implemented in 90nm SOI technology. The chip is designed with a high degree of modularity and reuse to maximize the custom circuit content and achieve a high-frequency clock-rate.


international solid-state circuits conference | 2010

The implementation of POWER7 TM : A highly parallel and scalable multi-core high-end server processor

Dieter Wendel; Ronald Nick Kalla; Robert Cargoni; Joachim Clables; Joshua Friedrich; Roland Frech; James Allan Kahle; Balaram Sinharoy; William J. Starke; Scott A. Taylor; Steve Weitzel; Sam Gat-Shang Chu; Saiful Islam; Victor Zyuban

The next processor of the POWER ™ family, called POWER7™ is introduced. Eight quad-threaded cores are integrated together with two memory controllers and high-speed system links on a 567mm2 die, employing 1.2B transistors in 45nm CMOS SOI technology [4]. High on-chip performance and therefore bandwidth is achieved using 11 layers of low-к copper wiring and devices with enhanced dual-stress liners. The technology features deep trench [DT] capacitors that are used to build the 32MB embedded DRAM L3 based on a 0.067µm2 DRAM cell. DT capacitors are used also to reduce on-chip voltage-island supply noise. Focusing on speed, the dual-supply ripple-domino SRAM concepts follows the schemes described elsewhere.


international solid-state circuits conference | 2007

Implementation of the CELL Broadband Engine in a 65nm SOI Technology Featuring Dual-Supply SRAM Arrays Supporting 6GHz at 1.3V

Jürgen Pille; Chad Adams; T. Christensen; Scott R. Cottier; Sebastian Ehrenreich; T. Kono; D. Nelson; Osamu Takahashi; Shunsako Tokito; Otto Torreiter; Otto Wagner; Dieter Wendel

The 65nm CELL Broadband Enginetrade design features a dual power supply, which enhances SRAM stability and performance using an elevated array-specific power supply, while reducing the logic power consumption. Hardware measurements demonstrate low-voltage operation and reduced scatter of the minimum operating voltage. The chip operates at 6GHz at 1.3V and is fabricated in a 65nm CMOS SOI technology.


IEEE Journal of Solid-state Circuits | 2011

POWER7™, a Highly Parallel, Scalable Multi-Core High End Server Processor

Dieter Wendel; R Kalla; James D. Warnock; R. Cargnoni; S G Chu; J G Clabes; Daniel M. Dreps; D. Hrusecky; Joshua Friedrich; Saiful Islam; J Kahle; Jens Leenstra; Gaurav Mittal; Jose Angel Paredes; Jürgen Pille; Phillip J. Restle; Balaram Sinharoy; G Smith; W J Starke; S Taylor; J. A. Van Norstrand; Stephen Douglas Weitzel; P G Williams; Victor Zyuban

This paper gives an overview of the latest member of the POWER™ processor family, POWER7™. Eight quad-threaded cores, operating at frequencies up to 4.14 GHz, are integrated together with two memory controllers and high speed system links on a 567 mm die, employing 1.2B transistors in a 45 nm CMOS SOI technology with 11 layers of low-k copper wiring. The technology features deep trench capacitors which are used to build a 32 MB embedded DRAM L3 based on a 0.067 m DRAM cell. The functionally equivalent chip transistor count would have been over 2.7B if the L3 had been implemented with a conventional 6 transistor SRAM cell. (A detailed paper about the eDRAM implementation will be given in a separate paper of this Journal). Deep trench capacitors are also used to reduce on-chip voltage island supply noise. This paper describes the organization of the design and the features of the processor core, before moving on to discuss the circuits used for analog elements, clock generation and distribution, and I/O designs. The final section describes the details of the clocked storage elements, including special features for test, debug, and chip frequency tuning.


Ibm Journal of Research and Development | 2000

Custom circuit design as a driver of microprocessor performance

D. H. Allen; Sang Hoo Dhong; Harm Peter Hofstee; Jens Leenstra; Kevin J. Nowka; D. L. Stasiak; Dieter Wendel

This paper presents a survey of some of the most aggressive custom designs for CMOS processor products and prototypes in IBM. We argue that microprocessor performance growth, which has traditionally been driven primarily by CMOS technology and microarchitectural improvements, can receive a substantial contribution from improvements in circuit design and physical organization. We predict that in future microprocessor designs the floorplan and wire plan will be as important as the microarchitecture, more control logic will be structured and become indistinguishable from dataflow elements, and more circuits will be designed and analyzed at the level of single transistors and wires.


international solid-state circuits conference | 2010

POWER7 TM local clocking and clocked storage elements

James D. Warnock; Leon J. Sigal; Dieter Wendel; K. Paul Muller; Joshua Friedrich; Victor Zyuban; Ethan H. Cannon; Aj Kleinosowski

The design of the clocked storage elements (CSEs) and associated local clocking circuitry is a critical consideration for modern microprocessor projects[1], and the POWER7™ chip[2], designed in a 45nm silicon-on-insulator (SOI) technology, was no exception. The digital logic contained over 2M CSEs, and the design of these elements had a major impact not only on the area, power, and performance of the chip, but also on the reliability, testability, and the ability to debug and optimize the hardware. This paper will focus on the special features added to the CSE design with these considerations in mind.


international symposium on microarchitecture | 2005

Cell processor low-power design methodology

Daniel Lawrence Stasiak; Rajat Chaudhry; Dennis Thomas Cox; Stephen D. Posluszny; James D. Warnock; Steve Weitzel; Dieter Wendel; Michael Wang

Power consumption is a major challenge in VLSI design. Power-constrained designs must attack power reduction with many techniques and require tools to accurately predict the power consumption. These tools give designers feedback on the efficiency of the power management logic. We present the basic methodology behind cycle-accurate power estimation. This forms a basis for explaining the techniques used to reduce power in the first-generation Cell processor, along with data that correlates our hardware measurements against power estimates.


asia and south pacific design automation conference | 2006

Key features of the design methodology enabling a multi-core SoC implementation of a first-generation CELL processor

D. Pham; Hans-Werner Anderson; Erwin Behnen; Mark Bolliger; Sanjay Gupta; H. Peter Hofstee; Paul Harvey; Charles Ray Johns; James Allan Kahle; Atsushi Kameyama; John M. Keaty; Bob Le; Sang Lee; Tuyen V. Nguyen; John George Petrovick; Mydung Pham; Juergen Pille; Stephen D. Posluszny; Mack W. Riley; Joseph Roland Verock; James D. Warnock; Steve Weitzel; Dieter Wendel

This paper reviews the design challenges that current and future processors must face, with stringent power limits and high frequency targets, and the design methods required to overcome the above challenges and address the continuing Giga-scale system integration trend. This paper then describes the details behind the design methodology that was used to successfully implement a first-generation CELL processor - a multi-core SoC. Key features of this methodology are broad optimization with fast rule-based analysis engines using macro-level abstraction for constraints propagation up/down the design hierarchy, coupled with accurate transistor level simulation for detailed analysis. The methodology fostered the modular design concept that is inherent to the CELL architecture, enabling a high frequency design by maximizing custom circuit content through re-use, and balanced power, frequency, and die size targets through global convergence capabilities. The design has roughly 241 million transistors implemented in 90 nm SOI technology with 8 levels of copper interconnects and one local interconnect layer. The chip has been tested at various temperatures, voltages, and frequencies. Correct operation has been observed in the lab on first pass silicon at frequencies well over 4GHz.


custom integrated circuits conference | 2005

The design methodology and implementation of a first-generation CELL processor: a multi-core SoC

D. Pham; Erwin Behnen; Mark Bolliger; H.P. Hofstee; Charles Ray Johns; James Allan Kahle; Atsushi Kameyama; John M. Keaty; B. Le; Y. Masubuchi; Stephen D. Posluszny; Mack W. Riley; M. Suzuoki; M. Wang; James D. Warnock; Steve Weitzel; Dieter Wendel; K. Yazawa

This paper reviews the design challenges that current and future processors must face with stringent power limits and high frequency targets, and the design methods required to address the continuing system integration trends. This paper then describes the implementation of a first-generation CELL processor and the design methods used to overcome the above challenges. A CELL processor consists of a 64 bit power architecture processor coupled with multiple synergistic processors, a flexible IO interface, and a memory interface controller that supports multiple operating systems including Linux. This multicore SoC, implemented in 90nm SOI technology, achieved a high clock rate by maximizing custom circuit design while maintaining reasonable complexity through design modularity and reuse.


international solid-state circuits conference | 2011

A 4R2W register file for a 2.3GHz wire-speed POWER™ processor with double-pumped write operation

Gary S. Ditlow; Robert K. Montoye; Salvatore N. Storino; Sherman M. Dance; Sebastian Ehrenreich; Bruce M. Fleischer; Thomas W. Fox; Kyle M. Holmes; Junichi Mihara; Yutaka Nakamura; Shohji Onishi; Robert Shearer; Dieter Wendel; Leland Chang

In multi-ported register files, memory cell size grows quadratically with the total number of ports due to wordline and bitline wiring. Reducing the number of physical access ports in a memory cell can thus lead to significant area and power savings as well as latency improvement. Double-pumped register files operate access ports twice in a single clock period to reduce area by halving the number of physical ports in the memory cell — a technique often confined to low-frequency applications. Replication of a memory cell in separate arrays halves the number of physical read ports in each copy. In this work, double-pumped write ports and replicated read ports are applied to a 4R2W register file in a highperformance microprocessor product [1]. This paper describes detailed implementation and measured hardware characteristics of this array and demonstrates a fast error correction scheme. The techniques used balance high efficiency and low latency and thus differ from previous work, in which double-pumped ports perform a write followed by a read in a very large register file [2] or where write ports are double-pumped without cell-level read port reduction [3].

Researchain Logo
Decentralizing Knowledge