Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Juergen Pille is active.

Publication


Featured researches published by Juergen Pille.


IEEE Journal of Solid-state Circuits | 2015

The 12-Core POWER8™ Processor With 7.6 Tb/s IO Bandwidth, Integrated Voltage Regulation, and Resonant Clocking

Eric Fluhr; Steve Baumgartner; David William Boerstler; John F. Bulzacchelli; Timothy Diemoz; Daniel M. Dreps; George English; Joshua Friedrich; Anne E. Gattiker; Tilman Gloekler; Christopher J. Gonzalez; Jason D. Hibbeler; Keith A. Jenkins; Yong Kim; Paul Muench; Ryan Nett; Jose Angel Paredes; Juergen Pille; Donald W. Plass; Phillip J. Restle; Raphael Robertazzi; David Shan; David W. Siljenberg; Michael A. Sperling; Kevin Stawiasz; Gregory Scott Still; Zeynep Toprak-Deniz; James D. Warnock; Glen A. Wiedemeier; Victor Zyuban

POWER8™ is a 12-core processor fabricated in IBMs 22 nm SOI technology with core and cache improvements driven by big data applications, providing 2.5× socket performance over POWER7+™. Core throughput is supported by 7.6 Tb/s of off-chip I/O bandwidth which is provided by three primary interfaces, including two new variants of Elastic Interface as well as embedded PCI Gen-3. Power efficiency is improved with several techniques. An on-chip controller based on an embedded PowerPC™ 405 processor applies per-core DVFS by adjusting DPLLs and fully integrated voltage regulators. Each voltage regulator is a highly distributed system of digitally controlled microregulators, which achieves a peak power efficiency of 90.5%. A wide frequency range resonant clock design is used in 13 clock meshes and demonstrates a minimum power savings of 4%. Power and delay efficiency is achieved through the use of pulsed-clock latches, which require statistical validation to ensure robust yield.


asia and south pacific design automation conference | 2006

Key features of the design methodology enabling a multi-core SoC implementation of a first-generation CELL processor

D. Pham; Hans-Werner Anderson; Erwin Behnen; Mark Bolliger; Sanjay Gupta; H. Peter Hofstee; Paul Harvey; Charles Ray Johns; James Allan Kahle; Atsushi Kameyama; John M. Keaty; Bob Le; Sang Lee; Tuyen V. Nguyen; John George Petrovick; Mydung Pham; Juergen Pille; Stephen D. Posluszny; Mack W. Riley; Joseph Roland Verock; James D. Warnock; Steve Weitzel; Dieter Wendel

This paper reviews the design challenges that current and future processors must face, with stringent power limits and high frequency targets, and the design methods required to overcome the above challenges and address the continuing Giga-scale system integration trend. This paper then describes the details behind the design methodology that was used to successfully implement a first-generation CELL processor - a multi-core SoC. Key features of this methodology are broad optimization with fast rule-based analysis engines using macro-level abstraction for constraints propagation up/down the design hierarchy, coupled with accurate transistor level simulation for detailed analysis. The methodology fostered the modular design concept that is inherent to the CELL architecture, enabling a high frequency design by maximizing custom circuit content through re-use, and balanced power, frequency, and die size targets through global convergence capabilities. The design has roughly 241 million transistors implemented in 90 nm SOI technology with 8 levels of copper interconnects and one local interconnect layer. The chip has been tested at various temperatures, voltages, and frequencies. Correct operation has been observed in the lab on first pass silicon at frequencies well over 4GHz.


symposium on vlsi circuits | 2005

The circuits and physical design of the synergistic processor element of a CELL processor

Osamu Takahashi; R. Cook; Scott R. Cottier; Sang Hoo Dhong; Brian Flachs; Koji Hirairi; Atsushi Kawasumi; H. Murakami; H. Noro; H. Oh; S. Onishi; Juergen Pille; Joel Abraham Silberman; S. Yong

A 32b 4-way SIMD dual-issue synergistic processor element of a CELL processor is developed with 20.9 million transistors in 14.8mm/sub 2/ using a 90nm SOI technology. CMOS static gates implement the majority of the logic. Dynamic circuits are used in critical areas, occupying 19% of the non-SRAM area. ISA, microarchitecture, and physical implementation are tightly coupled to achieve a compact and power efficient design. Correct operation has been observed up to 5.6GHz at 1.4V supply and 56/spl deg/C.


IEEE Journal of Solid-state Circuits | 2008

Implementation of the Cell Broadband Engine™ in 65 nm SOI Technology Featuring Dual Power Supply SRAM Arrays Supporting 6 GHz at 1.3 V

Juergen Pille; Chad Adams; Todd Alan Christensen; Scott R. Cottier; Sebastian Ehrenreich; Fumihiro Kono; Daniel Mark Nelson; Osamu Takahashi; Shunsako Tokito; Otto Torreiter; Otto Wagner; Dieter Wendel

The 65 nm cell broadband enginetrade (cell BE) is a multi-core SoC, implemented in a high performance SOI technology featuring a separate dual power supply for SRAM arrays to improve stability and performance using an elevated voltage. A new method is shown to analyze the SRAM cell under application conditions which was used to tune the cell for stability, write-ability and performance. An improved write scheme is shown which widens the overall functional window and allows setting the power/performance point of the arrays independently of the surrounding logic. Hardware measurements demonstrate the advantages of the dual power supply under different aspects.


international conference on computer aided design | 2005

The circuit design of the synergistic processor element of a CELL processor

Osamu Takahashi; Russ Cook; Scott R. Cottier; Sang Hoo Dhong; Brian Flachs; Koji Hirairi; Atsushi Kawasumi; Hiroaki Murakami; Hiromi Noro; Hwa-Joon Oh; S. Onish; Juergen Pille; Joel Abraham Silberman

A 32b 4-way SIMD dual-issue synergistic processor element of a CELL processor is developed with 20.9 million transistors in 14.8mm/sup 2/ using a 90nm SOI technology. CMOS static gates implement the majority of the logic. Dynamic circuits are used in critical areas, occupying 19% of the nonSRAM area. ISA, microarchitecture and physical implementation are tightly coupled to achieve a compact and power efficient design. Correct operation has been observed up to 5.6GHz at 1.4V supply and 56/spl deg/C.


international solid-state circuits conference | 2014

5.1 POWER8 TM : A 12-core server-class processor in 22nm SOI with 7.6Tb/s off-chip bandwidth

Eric Fluhr; Joshua Friedrich; Daniel M. Dreps; Victor Zyuban; Gregory Scott Still; Christopher J. Gonzalez; Allen Hall; David Hogenmiller; Frank Malgioglio; Ryan Nett; Jose Angel Paredes; Juergen Pille; Donald W. Plass; Ruchir Puri; Phillip J. Restle; David Shan; Kevin Stawiasz; Zeynep Toprak Deniz; Dieter Wendel; Matt Ziegler

The 12-core 649mm2 POWER8™ leverages IBMs 22nm eDRAM SOI technology [1], and microarchitectural enhancements to deliver up to 2.5× the socket performance [2] of its 32nm predecessor, POWER7+™ [3]. POWER8 contains 4.2B transistors and 31.5μF of deep-trench decoupling capacitance. Three thin-oxide transistor Vts are used for power/performance tuning, and thick-oxide transistors enable high-voltage I/O and analog designs. The 15-layer BEOL contains 5-80nm, 2-144nm, 3-288nm, and 3-640nm pitch layers for low-latency communication as well as 2-2400nm ultra-thick-metal (UTM) pitch layers for low-resistance distribution of power and clocks.


international solid-state circuits conference | 2010

A 32kB 2R/1W L1 data cache in 45nm SOI technology for the POWER7TM processor

Juergen Pille; Dieter Wendel; Otto Wagner; Rolf Sautter; Wolfgang Penth; Thomas Froehnel; Stefan Buettner; Otto Torreiter; Martin Eckert; Jose Angel Paredes; David A. Hrusecky; David Scott Ray; Miles G. Canada

Increasing demand for parallelism due to out-of-order and multi-threading computation requires fast and dense arrays with multi-port capabilities. The load-store-unit (LSU) of the POWER7™ microprocessor core has a 32kB L1 data cache composed of four 8kB blocks. In a two-cycle back-to-back operation it supports concurrently two independent read and one write operations. Organized in banks of 16 cells each, the two reads operate independently in any of these banks, including two reads within the same bank, even the same cell. A bank selected for write is blocked for any read operation. If read and write collide within the same bank, collision-control circuitry provides write-over-read priority. Each read port provides 4B from 1 of 256 locations, whereas the double-bandwidth write operation provides individual control of 8B to 128 locations.


international solid-state circuits conference | 2017

3.1 POWER9™: A processor family optimized for cognitive computing with 25Gb/s accelerator links and 16Gb/s PCIe Gen4

Christopher J. Gonzalez; Eric Fluhr; Daniel M. Dreps; David Hogenmiller; Rahul M. Rao; Jose Angel Paredes; Michael Stephen Floyd; Michael A. Sperling; Ryan Kruse; Vinod Ramadurai; Ryan Nett; Saiful Islam; Juergen Pille; Donald W. Plass

Cognitive computing and cloud infrastructure require flexible, connectable, and scalable processors with extreme IO bandwidth. With 4 distinct chip configurations, the POWER9 family of chips delivers multiple options for memory ports, core thread counts, and accelerator options to address this need. The 24-core scale-out processor is implemented in 14nm SOI FinFET technology [1] and contains 8.0B transistors. The 695mm2 chip uses 17 levels of copper interconnect: 3–64nm, 2–80nm, 4–128nm, 2–256nm, 4–360nm pitch wiring for signals and 2– 2400nm pitch wiring levels for power and global clock distribution. Digital logic uses three thin-oxide transistor Vts to balance power and performance requirements, while analog and high-voltage circuits eliminated thick-oxide devices providing process simplification and cost reduction. By leveraging the FinFETs increased current per area, the base standard cell image shrunk from 18 tracks per bit in planar 22nm to 10 tracks per bit in 14nm providing additional area scaling.


custom integrated circuits conference | 2010

A 32nm 0.5V-supply dual-read 6T SRAM

Jente B. Kuang; Jeremy D. Schaub; Fadi H. Gebara; Dieter Wendel; Sudesh Saroop; Tuyet Nguyen; Thomas Fröhnel; Antje Müller; Christopher M. Durham; Rolf Sautter; Bryan J. Lloyd; Bryan J. Robbins; Juergen Pille; Sani R. Nassif; Kevin J. Nowka

Dual read port SRAMs play a critical role in high performance cache designs, but stability and sensing challenges typically limit the low voltage operation. We report a high-performance dual read port 8-way set associative 6T SRAM with a one clock cycle access latency, in a 32nm metal-gate partially depleted (PD) SOI technology, for low-voltage applications. Hardware exhibits robust operation at 348MHz and 0.5V with a read and write power of 3.33 and 1.97mW, respectively, per 4.5KB active array with both read ports accessed at the highest activity data pattern. At a 0.6V supply, an access speed of 1.2GHz is observed.


european solid state circuits conference | 2015

A 4GHz, low latency TCAM in 14nm SOI FinFET technology using a high performance current sense amplifier for AC current surge reduction

Alexander Fritsch; Michael Kugel; Rolf Sautter; Dieter Wendel; Juergen Pille; Otto Torreiter; Shankar Kalyanasundaram; Daniel Dobson

A 4GHz, low latency TCAM in 14nm SOI FinFET technology, using a matchline current sensing scheme with an energy consumption of 0.63 fJ/bit/search at 0.9V and a peak current reduction of 50% compared to voltage sensing implementations. A by entry adjustable search depth allows to reduce power consumption for variable size translation tables. The implemented sandwich floorplan enables an area efficient integration of high performance 0.286μm2 16T-TCAM and 0.143μm2 8T-SRAM cells.

Researchain Logo
Decentralizing Knowledge