Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Daniel M. Dreps is active.

Publication


Featured researches published by Daniel M. Dreps.


IEEE Journal of Solid-state Circuits | 2011

POWER7™, a Highly Parallel, Scalable Multi-Core High End Server Processor

Dieter Wendel; R Kalla; James D. Warnock; R. Cargnoni; S G Chu; J G Clabes; Daniel M. Dreps; D. Hrusecky; Joshua Friedrich; Saiful Islam; J Kahle; Jens Leenstra; Gaurav Mittal; Jose Angel Paredes; Jürgen Pille; Phillip J. Restle; Balaram Sinharoy; G Smith; W J Starke; S Taylor; J. A. Van Norstrand; Stephen Douglas Weitzel; P G Williams; Victor Zyuban

This paper gives an overview of the latest member of the POWER™ processor family, POWER7™. Eight quad-threaded cores, operating at frequencies up to 4.14 GHz, are integrated together with two memory controllers and high speed system links on a 567 mm die, employing 1.2B transistors in a 45 nm CMOS SOI technology with 11 layers of low-k copper wiring. The technology features deep trench capacitors which are used to build a 32 MB embedded DRAM L3 based on a 0.067 m DRAM cell. The functionally equivalent chip transistor count would have been over 2.7B if the L3 had been implemented with a conventional 6 transistor SRAM cell. (A detailed paper about the eDRAM implementation will be given in a separate paper of this Journal). Deep trench capacitors are also used to reduce on-chip voltage island supply noise. This paper describes the organization of the design and the features of the processor core, before moving on to discuss the circuits used for analog elements, clock generation and distribution, and I/O designs. The final section describes the details of the clocked storage elements, including special features for test, debug, and chip frequency tuning.


international solid state circuits conference | 2010

A 4.5 mW/Gb/s 6.4 Gb/s 22+1-Lane Source Synchronous Receiver Core With Optional Cleanup PLL in 65 nm CMOS

Robert Reutemann; Michael Ruegg; Fran Keyser; John J. Bergkvist; Daniel M. Dreps; Thomas Toifl; Martin L. Schmatz

This paper describes the design of a product-level low-power source-synchronous link receiver macro for data rates of 3.2-6.4 Gb/s. The receiver macro consists of 22 data channels plus one forwarded-clock channel, and supports both differential and ground termination. A pulsed CDR with programmable bandwidth is implemented to save power in the CDR. Time dithering is applied to the CDR to avoid notches in the jitter tolerance curve. The receiver clock path incorporates both a clean-up PLL and a polyphase filter for RX clock generation, from which one can be chosen to generate the receive clock. It is shown how jitter in a source-synchronous link is related to skew between clock and data, as well as cross-talk from the data to the clock wires. The jitter performance of the RX using either the polyphase filter or the PLL for clock generation is compared for different loop bandwidths. The RX core was implemented in a 65 nm Bulk CMOS technology. Total power consumption for the 22+1 lane RX PHY core running at 6.4 Gbps with the polyphase filter and in pulsed CDR mode is 635 mW or 4.5 mW/Gbps.


IEEE Journal of Solid-state Circuits | 2015

The 12-Core POWER8™ Processor With 7.6 Tb/s IO Bandwidth, Integrated Voltage Regulation, and Resonant Clocking

Eric Fluhr; Steve Baumgartner; David William Boerstler; John F. Bulzacchelli; Timothy Diemoz; Daniel M. Dreps; George English; Joshua Friedrich; Anne E. Gattiker; Tilman Gloekler; Christopher J. Gonzalez; Jason D. Hibbeler; Keith A. Jenkins; Yong Kim; Paul Muench; Ryan Nett; Jose Angel Paredes; Juergen Pille; Donald W. Plass; Phillip J. Restle; Raphael Robertazzi; David Shan; David W. Siljenberg; Michael A. Sperling; Kevin Stawiasz; Gregory Scott Still; Zeynep Toprak-Deniz; James D. Warnock; Glen A. Wiedemeier; Victor Zyuban

POWER8™ is a 12-core processor fabricated in IBMs 22 nm SOI technology with core and cache improvements driven by big data applications, providing 2.5× socket performance over POWER7+™. Core throughput is supported by 7.6 Tb/s of off-chip I/O bandwidth which is provided by three primary interfaces, including two new variants of Elastic Interface as well as embedded PCI Gen-3. Power efficiency is improved with several techniques. An on-chip controller based on an embedded PowerPC™ 405 processor applies per-core DVFS by adjusting DPLLs and fully integrated voltage regulators. Each voltage regulator is a highly distributed system of digitally controlled microregulators, which achieves a peak power efficiency of 90.5%. A wide frequency range resonant clock design is used in 13 clock meshes and demonstrates a minimum power savings of 4%. Power and delay efficiency is achieved through the use of pulsed-clock latches, which require statistical validation to ensure robust yield.


international solid-state circuits conference | 2008

A 2.6mW 370MHz-to-2.5GHz Open-Loop Quadrature Clock Generator

Kyu-hyoun Kim; Paul W. Coteus; Daniel M. Dreps; Seongwon Kim; Sergey V. Rylov; Daniel J. Friedman

In this paper, a wide frequency open-loop quadrature generator is sufficiently compact to allow many stages to be cascaded affordably. The generator is built from cascaded quad corrector stages, each of which in turn, can be understood as a modification of a common interpolating 4-stage ring oscillator. In the circuit, the delay of each stage is a linear superposition of the delays Phi of the associated inner and outer loop elements. If the outer loop element inputs are made independent, the driven oscillator is resulted. Provided the input drive is sufficient, the frequency of the driven oscillator is that of the driving input, and the phase of each internal node is an interpolation of the phase of its input drive and the phase of the preceding stage. This interpolation acts to average offsets from quadrature in the incoming phases. If the input drive is insufficient, the oscillator will run near its natural or unloaded frequency, omega0=2pifo.


international symposium on quality electronic design | 2009

Design methodology of high performance on-chip global interconnect using terminated transmission-line

Yulei Zhang; Ling Zhang; Alina Deutsch; George A. Katopis; Daniel M. Dreps; James F. Buckwalter; Ernest S. Kuh; Chung-Kuan Cheng

We explore two schemes using transmission-line (T-line) to achieve high-performance global interconnects on VLSI chips. For both schemes, we select wire dimensions to ensure T-line effects present and employ inverter chains as drivers and receivers. In order to achieve high throughput and alleviate Inter-Symbol Interference (ISI), high termination resistance is used in the second scheme. For the two schemes, we discuss how to optimize the wire dimensions and the effects of driver impedance and termination resistance on the wire bandwidth. Secondly, design methodology is proposed to determine the optimal design variables for three objectives. We adopt the proposed methodology and compare the performance metrics with repeated RC wires. Simulation results show that, the proposed T-line schemes reduce the delay and improve the throughput as much as 82% and 63%, for min-ddp (delay2-power product) objective.


international conference on ic design and technology | 2014

The POWER8 TM processor: Designed for big data, analytics, and cloud environments

Joshua Friedrich; Hung Q. Le; William J. Starke; Jeff Stuechli; Balaram Sinharoy; Eric Fluhr; Daniel M. Dreps; Victor Zyuban; Gregory Scott Still; Christopher J. Gonzalez; David Hogenmiller; Frank Malgioglio; Ryan Nett; Ruchir Puri; Phillip J. Restle; David Shan; Zeynep Toprak Deniz; Dieter Wendel; Matthew M. Ziegler; Dave Victor

POWER8™ delivers a data-optimized design suited for analytics, cognitive workloads, and todays exploding data sizes. The design point results in a 2.5x performance gain over its predecessor, POWER7+™, for many workloads. In addition, POWER8 delivers the efficiency demanded by cloud computing models and also represents a first step toward creating an open ecosystem for server innovation.


international solid-state circuits conference | 2015

4.1 22nm Next-generation IBM System z microprocessor

James D. Warnock; Brian W. Curran; John Badar; Gregory J. Fredeman; Donald W. Plass; Yuen H. Chan; Sean M. Carey; Gerard M. Salem; Friedrich Schroeder; Frank Malgioglio; Guenter Mayer; Christopher J. Berry; Michael H. Wood; Yiu-Hing Chan; Mark D. Mayo; John Mack Isakson; Charudhattan Nagarajan; Tobias Werner; Leon J. Sigal; Ricardo H. Nigaglioni; Mark Cichanowski; Jeffrey A. Zitz; Matthew M. Ziegler; Tim Bronson; Gerald Strevig; Daniel M. Dreps; Ruchir Puri; Douglas J. Malone; Dieter Wendel; Pak-Kin Mak

The next-generation System z design introduces a new microprocessor chip (CP) and a system controller chip (SC) aimed at providing a substantial boost to maximum system capacity and performance compared to the previous zEC12 design in 32nm [1,2]. As shown in the die photo, the CP chip includes 8 high-frequency processor cores, 64MB of eDRAM L3 cache, interface IOs (“XBUS”) to connect to two other processor chips and the L4 cache chip, along with memory interfaces, 2 PCIe Gen3 interfaces, and an I/O bus controller (GX). The design is implemented on a 678 mm2 die with 4.0 billion transistors and 17 levels of metal interconnect in IBMs high-performance 22nm high-x CMOS SOI technology [3]. The SC chip is also a 678 mm2 die, with 7.1 billion transistors, running at half the clock frequency of the CP chip, in the same 22nm technology, but with 15 levels of metal. It provides 480 MB of eDRAM L4 cache, an increase of more than 2× from zEC12 [1,2], and contains an 18 MB eDRAM L4 directory, along with multi-processor cache control/coherency logic to manage inter-processor and system-level communications. Both the CP and SC chips incorporate significant logical, physical, and electrical design innovations.


high performance interconnects | 2008

Low Power Passive Equalizer Design for Computer Memory Links

Ling Zhang; Wenjian Yu; Yulei Zhang; Renshen Wang; Alina Deutsch; George A. Katopis; Daniel M. Dreps; James F. Buckwalter; Ernest S. Kuh; Chung-Kuan Cheng

Several types of low power passive equalizer is proposed and optimized in this work. The equalizer topologies include T-junction, parallel R-C and series R-L structures. These structures can be inserted at driver or/and receiver side at either the chip or package level and the communication bandwidth can be improved with little overhead on power consumption. Using the area of the eye as the objective function to be maximized, we optimized these equalizers for the CPU-memory interconnection of an IBM POWER6trade System with and without practical constraints on the RLCG parameter values. Our experimental results show that without employing any equalizers, the data-eye is closed for a bit-rate of 6.4 Gbps. We tried twelve different equalizer schemes and found they produce very different eye diagrams. The scheme yielding the maximum eye improves the height of the eye to more than 300 mV at a total power cost of 7.2 mW, while the scheme yielding the minimum jitter limits the jitter magnitude to 10 ps at a total power cost of 9.5 mW. We also have shown that the solution resulting from the proposed optimization approach have very small sensitivity to the tolerance of the R,L,C values and the magnitude of the coupled noise.


Ibm Journal of Research and Development | 2015

IBM POWER8 circuit design and energy optimization

Victor Zyuban; Joshua Friedrich; Daniel M. Dreps; Jürgen Pille; Donald W. Plass; Phillip J. Restle; Z. T. Deniz; M. M. Ziegler; S. Chu; Saiful Islam; James D. Warnock; R. Philhower; R. M. Rao; Gregory Scott Still; D. W. Shan; Eric Fluhr; Jose Angel Paredes; Dieter Wendel; Christopher J. Gonzalez; D. Hogenmiller; Ruchir Puri; S. A. Taylor; S. D. Posluszny

The IBM POWER8™ processor is a 649-mm


international solid-state circuits conference | 2014

5.1 POWER8 TM : A 12-core server-class processor in 22nm SOI with 7.6Tb/s off-chip bandwidth

Eric Fluhr; Joshua Friedrich; Daniel M. Dreps; Victor Zyuban; Gregory Scott Still; Christopher J. Gonzalez; Allen Hall; David Hogenmiller; Frank Malgioglio; Ryan Nett; Jose Angel Paredes; Juergen Pille; Donald W. Plass; Ruchir Puri; Phillip J. Restle; David Shan; Kevin Stawiasz; Zeynep Toprak Deniz; Dieter Wendel; Matt Ziegler

^{2}

Researchain Logo
Decentralizing Knowledge