Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Michael A. Sperling is active.

Publication


Featured researches published by Michael A. Sperling.


international solid-state circuits conference | 2014

5.2 Distributed system of digitally controlled microregulators enabling per-core DVFS for the POWER8 TM microprocessor

Zeynep Toprak-Deniz; Michael A. Sperling; John F. Bulzacchelli; Gregory Scott Still; Ryan Kruse; Seongwon Kim; David William Boerstler; Tilman Gloekler; Raphael Robertazzi; Kevin Stawiasz; Timothy Diemoz; George English; David T. Hui; Paul Muench; Joshua Friedrich

Integrated voltage regulator modules (iVRMs) [1] provide a cost-effective path to realizing per-core dynamic voltage and frequency scaling (DVFS), which can be used to optimize the performance of a power-constrained multi-core processor. This paper presents an iVRM system developed for the POWER8™ microprocessor, which functions as a very fast, accurate low-dropout regulator (LDO), with 90.5% peak power efficiency (only 3.1% worse than an ideal LDO). At low output voltages, efficiency is reduced but still sufficient to realize beneficial energy savings with DVFS. Each iVRM features a bypass mode so that some of the cores can be operated at maximum performance with no regulator loss. With the iVRM area including the input decoupling capacitance (DCAP) (but not the output DCAP inherent to the cores), the iVRMs achieve a power density of 34.5W/mm2, which exceeds that of inductor-based or SC converters by at least 3.4× [2].


IEEE Journal of Solid-state Circuits | 2011

A 45 nm SOI Embedded DRAM Macro for the POWER™ Processor 32 MByte On-Chip L3 Cache

John E. Barth; Don Plass; Erik A. Nelson; Charlie Hwang; Gregory J. Fredeman; Michael A. Sperling; Abraham Mathews; Toshiaki Kirihata; William Robert Reohr; Kavita Nair; Nianzheng Caon

A 1.35 ns random access and 1.7 ns-random-cycle SOI embedded-DRAM macro has been developed for the POWER7™ high-performance microprocessor. The macro employs a 6 transistor micro sense-amplifier architecture with extended precharge scheme to enhance the sensing margin for product quality. The detailed study shows a 67% bit-line power reduction with only 1.7% area overhead, while improving a read zero margin by more than 500ps. The array voltage window is improved by the programmable BL voltage generator, allowing the embedded DRAM to operate reliably without constraining of the microprocessor voltage supply windows. The 2.5nm gate oxide transistor cell with deep-trench capacitor is accessed by the 1.7 V wordline high voltage (VPP) with V WL low voltage (VWL), and both are generated internally within the microprocessor. This results in a 32 MB on-chip L3 on-chip-cache for 8 cores in a 567 mm POWER7™ die.


IEEE Journal of Solid-state Circuits | 2015

The 12-Core POWER8™ Processor With 7.6 Tb/s IO Bandwidth, Integrated Voltage Regulation, and Resonant Clocking

Eric Fluhr; Steve Baumgartner; David William Boerstler; John F. Bulzacchelli; Timothy Diemoz; Daniel M. Dreps; George English; Joshua Friedrich; Anne E. Gattiker; Tilman Gloekler; Christopher J. Gonzalez; Jason D. Hibbeler; Keith A. Jenkins; Yong Kim; Paul Muench; Ryan Nett; Jose Angel Paredes; Juergen Pille; Donald W. Plass; Phillip J. Restle; Raphael Robertazzi; David Shan; David W. Siljenberg; Michael A. Sperling; Kevin Stawiasz; Gregory Scott Still; Zeynep Toprak-Deniz; James D. Warnock; Glen A. Wiedemeier; Victor Zyuban

POWER8™ is a 12-core processor fabricated in IBMs 22 nm SOI technology with core and cache improvements driven by big data applications, providing 2.5× socket performance over POWER7+™. Core throughput is supported by 7.6 Tb/s of off-chip I/O bandwidth which is provided by three primary interfaces, including two new variants of Elastic Interface as well as embedded PCI Gen-3. Power efficiency is improved with several techniques. An on-chip controller based on an embedded PowerPC™ 405 processor applies per-core DVFS by adjusting DPLLs and fully integrated voltage regulators. Each voltage regulator is a highly distributed system of digitally controlled microregulators, which achieves a peak power efficiency of 90.5%. A wide frequency range resonant clock design is used in 13 clock meshes and demonstrates a minimum power savings of 4%. Power and delay efficiency is achieved through the use of pulsed-clock latches, which require statistical validation to ensure robust yield.


symposium on vlsi circuits | 2010

A DPLL-based per core variable frequency clock generator for an eight-core POWER7 ™ microprocessor

Jose A. Tierno; Alexander V. Rylyakov; Daniel J. Friedman; Ann Chen; Anthony E. Ciesla; Timothy Diemoz; George English; David T. Hui; Keith A. Jenkins; Paul Muench; Gaurav Rao; George William Smith; Michael A. Sperling; Kevin Stawiasz

A per-core clock generator for the eight-core POWER7™ processor is implemented with a digital PLL. This frequency generator is capable of smooth, controlled frequency slewing, minimizing the impact of di/dt. Frequency can be dynamically adjusted while the clock is running, and without skipping any cycles, thus enabling aggressive power management techniques.


international solid-state circuits conference | 2010

A 45nm SOI embedded DRAM macro for POWER7TM 32MB on-chip L3 cache

John E. Barth; Don Plass; Erik A. Nelson; Charlie Hwang; Gregory J. Fredeman; Michael A. Sperling; Abraham Mathews; William Robert Reohr; Kavita Nair; Nianzheng Cao

Logic-based embedded DRAM has matured into a wide range of ASIC applications, SRAM replacements [1] and off-chip caches for microprocessors [2]. While embedded DRAM has been leveraged in supercomputers such as IBMs BlueGene/L [3], its use has been limited to moderate performance bulk logic technologies. Although prototypes have been demonstrated [4], DRAM has yet to be embedded on a high performance microprocessor. This paper discloses an SOI DRAM macro implemented on-chip with the IBM POWER7™ high performance microprocessor [5], and introduces enhancements to the micro sense amp (µSA) architecture [6]. This high performance DRAM macro is used to construct a large 32MB L3 cache on-chip, eliminating delay, area and power from the off-chip interface, simultaneously improving system performance, reducing cost, power and soft error vulnerability. Figure 19.1.1a shows an SEM of the 45nm SOI DRAM Device and Deep Trench (DT) capacitor [7]. DT offers 25x more capacitance than planar structures and was also utilized to reduce on-chip voltage island supply noise.


custom integrated circuits conference | 2008

A wide tuning range (1 GHz-to-15 GHz) fractional-N all-digital PLL in 45nm SOI

Alexander V. Rylyakov; Jose A. Tierno; George English; Michael A. Sperling; Daniel J. Friedman

An all static CMOS (45 nm SOI) all-digital fractional-N PLL has a wide tuning range (from 0.84 GHz to 13.3 GHz, at 1.0 V, 65degC) and supports a broad range of multiplication factors (up to 1,000x) and reference clock speeds (from 2 MHz to 1 GHz). At 125degC the period jitter of the 4.12 GHz clock (206 MHz reference) is 1.1 ps rms (11.4 ps pp) at 1.3 V (52.4 mW), and 2.2 ps rms, (22.7 ps pp) at 0.7 V (9.7 mW). The area of the PLL is 175 mum times 160 mum.


great lakes symposium on vlsi | 2006

Synthesis of a wideband low noise amplifier

Abhishek Jajoo; Michael A. Sperling; Tamal Mukherjee

Two generations of a wideband low noise amplifier (LNA) employing noise canceling principle have been synthesized. Thefirst generation design was fabricated in a 0.35 μm SiGe BiCMOS process. It has a measured peak S21 of 17 dB and noise figure less than 3 dB over a bandwidth of 2.6 GHz while consuming 32.5 mW of power from a 2.5 V supply. The calculated figure of merit (FOM) is better than many reported wideband LNAs, including a few from even more advanced processes. The synthesis design constraints were improved based on the analysis of the first gener-ation design. A second generation design was synthesized with the updated constraints. Its simulation results show that its FOM is better than its predecessor.


IEEE Journal of Solid-state Circuits | 2016

A 14 nm 1.1 Mb Embedded DRAM Macro With 1 ns Access

Gregory J. Fredeman; Donald W. Plass; Abraham Mathews; Janakiraman Viraraghavan; Kenneth J. Reyer; Thomas J. Knips; Thomas R. Miller; Elizabeth L. Gerhard; Dinesh Kannambadi; Chris Paone; Dongho Lee; Daniel Rainey; Michael A. Sperling; Michael Whalen; Steven Burns; Rajesh Reddy Tummuru; Herbert L. Ho; Alberto Cestero; Norbert Arnold; Babar A. Khan; Toshiaki Kirihata; Subramanian S. Iyer

A 1.1 Mb embedded DRAM macro (eDRAM), for next-generation IBM SOI processors, employs 14 nm FinFET logic technology with 0.0174 μm2 deep-trench capacitor cell. A Gated-feedback sense amplifier enables a high voltage gain of a power-gated inverter at mid-level input voltage, while supporting 66 cells per local bit-line. A dynamic-and-gate-thin-oxide word-line driver that tracks standard logic process variation improves the eDRAM array performance with reduced area. The 1.1 Mb macro composed of 8 ×2 72 Kb subarrays is organized with a center interface block architecture, allowing 1 ns access latency and 1 ns bank interleaving operation using two banks, each having 2 ns random access cycle. 5 GHz operation has been demonstrated in a system prototype, which includes 6 instances of 1.1 Mb eDRAM macros, integrated with an array-built-in-self-test engine, phase-locked loop (PLL), and word-line high and word-line low voltage generators. The advantage of the 14 nm FinFET array over the 22 nm array was confirmed using direct tester control of the 1.1 Mb eDRAM macros integrated in 16 Mb inline monitor.


international solid-state circuits conference | 2017

3.1 POWER9™: A processor family optimized for cognitive computing with 25Gb/s accelerator links and 16Gb/s PCIe Gen4

Christopher J. Gonzalez; Eric Fluhr; Daniel M. Dreps; David Hogenmiller; Rahul M. Rao; Jose Angel Paredes; Michael Stephen Floyd; Michael A. Sperling; Ryan Kruse; Vinod Ramadurai; Ryan Nett; Saiful Islam; Juergen Pille; Donald W. Plass

Cognitive computing and cloud infrastructure require flexible, connectable, and scalable processors with extreme IO bandwidth. With 4 distinct chip configurations, the POWER9 family of chips delivers multiple options for memory ports, core thread counts, and accelerator options to address this need. The 24-core scale-out processor is implemented in 14nm SOI FinFET technology [1] and contains 8.0B transistors. The 695mm2 chip uses 17 levels of copper interconnect: 3–64nm, 2–80nm, 4–128nm, 2–256nm, 4–360nm pitch wiring for signals and 2– 2400nm pitch wiring levels for power and global clock distribution. Digital logic uses three thin-oxide transistor Vts to balance power and performance requirements, while analog and high-voltage circuits eliminated thick-oxide devices providing process simplification and cost reduction. By leveraging the FinFETs increased current per area, the base standard cell image shrunk from 18 tracks per bit in planar 22nm to 10 tracks per bit in 14nm providing additional area scaling.


international solid-state circuits conference | 2017

26.5 Adaptive clocking in the POWER9™ processor for voltage droop protection

Michael Stephen Floyd; Phillip J. Restle; Michael A. Sperling; Pawel Owczarczyk; Eric Fluhr; Joshua Friedrich; Paul Muench; Timothy Diemoz; Pierce Chuang; Christos Vezyrtzis

Increasing transistor counts in modern processors can create instantaneous changes in current, driving nanosecond-speed supply voltage (VDD) droops that require extra guardband for correct product operation. The POWER9 processor uses an adaptive clock strategy to reduce timing margin needed during power supply droop events by embedding analog voltage-droop monitors (VDMs) that direct a digital phase-locked loop (DPLL) to immediately reduce clock frequency in response.

Researchain Logo
Decentralizing Knowledge