Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where David Hogenmiller is active.

Publication


Featured researches published by David Hogenmiller.


international conference on ic design and technology | 2014

The POWER8 TM processor: Designed for big data, analytics, and cloud environments

Joshua Friedrich; Hung Q. Le; William J. Starke; Jeff Stuechli; Balaram Sinharoy; Eric Fluhr; Daniel M. Dreps; Victor Zyuban; Gregory Scott Still; Christopher J. Gonzalez; David Hogenmiller; Frank Malgioglio; Ryan Nett; Ruchir Puri; Phillip J. Restle; David Shan; Zeynep Toprak Deniz; Dieter Wendel; Matthew M. Ziegler; Dave Victor

POWER8™ delivers a data-optimized design suited for analytics, cognitive workloads, and todays exploding data sizes. The design point results in a 2.5x performance gain over its predecessor, POWER7+™, for many workloads. In addition, POWER8 delivers the efficiency demanded by cloud computing models and also represents a first step toward creating an open ecosystem for server innovation.


international solid-state circuits conference | 2014

5.1 POWER8 TM : A 12-core server-class processor in 22nm SOI with 7.6Tb/s off-chip bandwidth

Eric Fluhr; Joshua Friedrich; Daniel M. Dreps; Victor Zyuban; Gregory Scott Still; Christopher J. Gonzalez; Allen Hall; David Hogenmiller; Frank Malgioglio; Ryan Nett; Jose Angel Paredes; Juergen Pille; Donald W. Plass; Ruchir Puri; Phillip J. Restle; David Shan; Kevin Stawiasz; Zeynep Toprak Deniz; Dieter Wendel; Matt Ziegler

The 12-core 649mm2 POWER8™ leverages IBMs 22nm eDRAM SOI technology [1], and microarchitectural enhancements to deliver up to 2.5× the socket performance [2] of its 32nm predecessor, POWER7+™ [3]. POWER8 contains 4.2B transistors and 31.5μF of deep-trench decoupling capacitance. Three thin-oxide transistor Vts are used for power/performance tuning, and thick-oxide transistors enable high-voltage I/O and analog designs. The 15-layer BEOL contains 5-80nm, 2-144nm, 3-288nm, and 3-640nm pitch layers for low-latency communication as well as 2-2400nm ultra-thick-metal (UTM) pitch layers for low-resistance distribution of power and clocks.


symposium on vlsi circuits | 2015

Resonant clock mega-mesh for the IBM z13 TM

David Shan; Phillip J. Restle; Doug Malone; Robert A. Groves; Eric Lai; Michael Koch; Jason D. Hibbeler; Yong Kim; Christos Vezyrtzis; Jan Feder; David Hogenmiller; Thomas J. Bucelot

The IBM z13TM microprocessor utilizes a large resonant “mega-mesh” global clock distribution saving 50% of the final-stage clock mesh power and 8% of the total chip power in the desired frequency range of 4.5 to 5.5 GHz compared to a simulated, non-resonant base-line design. The mega-mesh is driven by pulsed buffers. Measurement of the mega-meshs robustness is enabled by skew gradients created by programmable delays. The design is implemented in IBMs high-performance 22nm high-k CMOS SOI technology with 17 metal layers [1].


international solid-state circuits conference | 2014

5.3 Wide-frequency-range resonant clock with on-the-fly mode changing for the POWER8 TM microprocessor

Phillip J. Restle; David Shan; David Hogenmiller; Yong Kim; Alan J. Drake; Jason D. Hibbeler; Thomas J. Bucelot; Gregory Scott Still; Keith A. Jenkins; Joshua Friedrich

A resonant-clock design for the IBM POWER8 processor core was implemented with 2 resonant modes (and a non-resonant mode), saving clock power over a wide frequency range from 2.5GHz to more than 5GHz. The POWER8 microprocessor is composed of 12 chiplets, each containing a single resonant clock grid for one core and its L2 cache, and a half-frequency, non-resonant clock grid for the L3 cache. The clock grids drive the local clock buffers (LCBs) that in turn drive the latches. The LCBs are gated off to measure the global clock power from the PLL to the LCBs. The resonant core communicates synchronously with the L3, requiring low skew between the domains. The chip was designed in a 22nm SOI process, including two ultra-thick-metal (UTM) layers (3 microns thick) for power distribution, I/O, all long global clock wires, and the resonant clock inductors. The UTM technology reduces wire resistance and simplifies inductor design, but requires accurate transmission line modeling and special routing.


international solid-state circuits conference | 2017

3.1 POWER9™: A processor family optimized for cognitive computing with 25Gb/s accelerator links and 16Gb/s PCIe Gen4

Christopher J. Gonzalez; Eric Fluhr; Daniel M. Dreps; David Hogenmiller; Rahul M. Rao; Jose Angel Paredes; Michael Stephen Floyd; Michael A. Sperling; Ryan Kruse; Vinod Ramadurai; Ryan Nett; Saiful Islam; Juergen Pille; Donald W. Plass

Cognitive computing and cloud infrastructure require flexible, connectable, and scalable processors with extreme IO bandwidth. With 4 distinct chip configurations, the POWER9 family of chips delivers multiple options for memory ports, core thread counts, and accelerator options to address this need. The 24-core scale-out processor is implemented in 14nm SOI FinFET technology [1] and contains 8.0B transistors. The 695mm2 chip uses 17 levels of copper interconnect: 3–64nm, 2–80nm, 4–128nm, 2–256nm, 4–360nm pitch wiring for signals and 2– 2400nm pitch wiring levels for power and global clock distribution. Digital logic uses three thin-oxide transistor Vts to balance power and performance requirements, while analog and high-voltage circuits eliminated thick-oxide devices providing process simplification and cost reduction. By leveraging the FinFETs increased current per area, the base standard cell image shrunk from 18 tracks per bit in planar 22nm to 10 tracks per bit in 14nm providing additional area scaling.


vlsi test symposium | 2013

On-chip circuit for measuring multi-GHz clock signal waveforms

Keith A. Jenkins; Phillip J. Restle; P. Z. Wang; David Hogenmiller; David William Boerstler; Thomas J. Bucelot

An on-chip circuit to measure full analog waveforms of internal signals is described. It can measure signals up to a repetition rate of at least 7 GHz, a bandwidth of at least 12 GHz, with accuracy required to detect subtle differences in signals, and it can measure overshoot above the rail voltage. It has been demonstrated on an experimental clock grid with optional resonant operation.


Archive | 1999

Circuit Design Margin and Design Variability

Kerry Bernstein; Keith M. Carrig; Christopher M. Durham; Patrick R. Hansen; David Hogenmiller; Edward J. Nowak; Norman J. Rohrer

In the preceding chapters, process variations and circuits styles were discussed. Each circuit style has its own reaction to variations of the process. Each variation must be accounted for to maintain the functionality and desired speed of the circuit across these distributions. All process parameter distributions are a function of the range that the parameter is critical both spatially and temporally. This chapter will investigate the variation of the process on static CMOS logic, dynamic domino, pass gate and DCVS logic.


Archive | 1999

Slack Borrowing and Time Stealing

Kerry Bernstein; Keith M. Carrig; Christopher M. Durham; Patrick R. Hansen; David Hogenmiller; Edward J. Nowak; Norman J. Rohrer

With any circuit, clocking, and latching selection, the concept of how to fit more logic within a path between latches than is readily available always becomes an issue. That is, inevitably a logical pipeline partition will require more time than is available, for example, more than a full-cycle time in a master-slave system or a half-cycle in a two-phase separated-latch system. Depending on the circuit style, the latching structure, and the clocking strategy, obtaining this time can be classified as one of two categories, slack borrowing and time stealing (also commonly referred to as cycle stealing).


Archive | 1999

Non-Clocked Logic Styles

Kerry Bernstein; Keith M. Carrig; Christopher M. Durham; Patrick R. Hansen; David Hogenmiller; Edward J. Nowak; Norman J. Rohrer

Non-clocked logic is ubiquitous in electronic design, due to a number of considerations including: Low power consumption Straightforward delay rule timing Inherent reliability and noise immunity Process variation and defect tolerance Migratability into successive technology generations. Deterministic diagnostic capability.


Archive | 1999

Clocked Logic Styles

Kerry Bernstein; Keith M. Carrig; Christopher M. Durham; Patrick R. Hansen; David Hogenmiller; Edward J. Nowak; Norman J. Rohrer

In the preceeding chapter, nonclocked circuit topologies were shown generally to be versatile, reliable, and relatively low in power consumption. Clocked logic, on the other hand, is recognized for its performance advantages, which may be attributed to the following: 1. In Static CMOS, logic must be built redundantly; circuit operations must be realized in both NFET and PFET device structures to accomodate both up and down logic transitions. This reduces performance by adding gate fan-out load and interconnect RC. Higher device counts lead to longer interconnects, higher power consumption and bigger die1. 2. In static CMOS, even when the redundant structure is off, the added diffusion and overlap capacitive loads increase power and delay. 3. In Static CMOS, PFET devices must drive the same loads as NFET devices, at half the transconductance. This drives PFET devices to generally be 2X the width of NFET devices for balanced transitions. The impact is of particular concern structures such as PFET devices 5 and 6 in Figure 2.2a.

Researchain Logo
Decentralizing Knowledge