Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Samuel Naffziger is active.

Publication


Featured researches published by Samuel Naffziger.


IEEE Journal of Solid-state Circuits | 2006

Power and temperature control on a 90-nm Itanium family processor

Rich Mcgowen; Christopher A. Poirier; Chris Bostak; Jim Ignowski; Mark Millican; Warren H. Parks; Samuel Naffziger

This paper describes the embedded feedback and control system on a 90-nm Itanium family processor, code-named Montecito, that maximizes performance while staying within a target power and temperature (PT) envelope. This system, referred to as Foxton Technology (FT), utilizes on-chip sensors and an embedded microcontroller to measure PT and modulate both voltage and frequency (VF) to optimize performance while meeting PT constraints. Changing both VF takes advantage of the cubic relationship of P/spl prop/CV/sup 2/F. We present measured results that show a 31% reduction in power for only a 10% drop in frequency. Montecito is able to implement FT using only 0.5% of the die area and 0.5% of the die power.


international electron devices meeting | 2005

Scaling, power, and the future of CMOS

Mark Horowitz; Elad Alon; Dinesh Patil; Samuel Naffziger; Rajesh Kumar; Kerry Bernstein

This paper briefly reviews the forces that caused the power problem, the solutions that were applied, and what the solutions tell us about the problem. As systems became more power constrained, optimizing the power became more critical; viewing power reduction from an optimization perspective provides valuable insights. Section III describes these insights in more detail, including why Vdd and Vth have stopped scaling. Section IV describes some of the low power techniques that have been used in the past in the context of the optimization framework. This framework also makes it easy to see the impact of variability, which is discussed in more detail in section V along with the adaptive mechanisms that have been proposed and deployed to minimize the energy cost. Section VI describes possible strategies for dealing with the slowdown in gate energy scaling, and the final section concludes by discussing the implications of these strategies for device designers


IEEE Journal of Solid-state Circuits | 2006

The implementation of a 2-core, multi-threaded itanium family processor

Samuel Naffziger; Blaine Stackhouse; Tom Grutkowski; Doug Josephson; Jayen Desai; Elad Alon; Mark Horowitz

The design of the high end server processor code named Montecito incorporated several ambitious goals requiring innovation. The most obvious being the incorporation of two legacy cores on-die and at the same time reducing power by 23%. This is an effective 325% increase in MIPS per watt which necessitated a holistic focus on power reduction and management. The next challenge in the implementation was to ensure robust and high frequency circuit operation in the 90-nm process generation which brings with it higher leakage and greater variability. Achieving this goal required new methodologies for design, a greatly improved and tunable clock system and a better understanding of our power grid behavior all of which required new circuits and capabilities. The final aspect of circuit design improvement involved the I/O design for our legacy multi-drop system bus. To properly feed the two high frequency cores with memory bandwidth we needed to ensure frequency headroom in the operation of the bus. This was achieved through several innovations in controllability and tuning of the I/O buffers which are discussed as well.


IEEE Journal of Solid-state Circuits | 2006

A 90-nm variable frequency clock system for a power-managed itanium architecture processor

Tim Fischer; Jayen Desai; Bruce Longmont Doyle; Samuel Naffziger; Ben Patella

An Itanium Architecture microprocessor in 90-nm CMOS with 1.7B transistors implements a dynamically-variable-frequency clock system. Variable frequency clocks support a power management scheme which maximizes processor performance within a configured power envelope. Core supply voltage and clock frequency are modulated dynamically in order to remain within the power envelope. The Foxton controller and dynamically-variable clock system reside on die while the variable voltage regulator and power measurement resistors reside off chip. In addition, high-bandwidth frequency adjustment allows the clock period to adapt during on-die supply transients, allowing higher frequency processor operation during transients than possible with a single-frequency clock system.


international solid-state circuits conference | 2010

A 32nm fully integrated reconfigurable switched-capacitor DC-DC converter delivering 0.55W/mm 2 at 81% efficiency

Hanh-Phuc Le; Michael D. Seeman; Seth R. Sanders; Visvesh S. Sathe; Samuel Naffziger; Elad Alon

With the rising integration levels used to increase digital processing performance, there is a clear need for multiple independent on-chip supplies in order to support per-IP or block power management. Simply adding multiple off-chip DCDC converters is not only difficult due to supply impedance concerns, but also adds cost to the platform by increasing motherboard size and package complexity. There is therefore a strong motivation to integrate voltage conversion blocks on the silicon chip.


international solid-state circuits conference | 2005

Clock distribution on a dual-core, multi-threaded Itanium/sup /spl reg//-family processor

P. Mahoney; Eric S. Fetzer; B. Doyle; Samuel Naffziger

Clock distribution on the 90 nm Itanium/spl reg/ processor, code-named Montecito, is detailed. A region-based active de-skew system reduces the PVT sources of skew across the entire die during normal operation. Clock vernier devices inserted at each local clock buffer allow up to a 10% clock-cycle adjustment via firmware or scan. The system supports a constantly varying frequency and consumes <25 W from the PLL to latch while providing <10 ps of skew across PVT.


international solid-state circuits conference | 2005

A 90nm variable-frequency clock system for a power-managed Itanium/sup /spl reg//-family processor

Tim Fischer; Ferd Anderson; Ben Patella; Samuel Naffziger

A clock-generation system delivers fixed- and variable-frequency clocks for adaptive power control on a 1.7 B-transistor dual-core CPU. Frequency synthesizers digitally divide a fixed-frequency PLL clock in 1/64th cycle steps using programmable voltage-frequency-converter loops. 1-cycle loop response tracks supply transients with adaptive modulation, improving CPU performance by over 10% compared to a fixed-frequency design.


international solid-state circuits conference | 2012

Resonant clock design for a power-efficient high-volume x86–64 microprocessor

Visvesh S. Sathe; Srikanth Arekapudi; Alexander T. Ishii; Charles Ouyang; Marios C. Papaefthymiou; Samuel Naffziger

AMDs 32-nm x86-64 core code-named “Piledriver” features a resonant global clock distribution to reduce clock distribution power while maintaining a low clock skew. To support a wide range of operating frequencies expected of the core, the global clock system operates in two modes: a resonant-clock (rclk) mode for energy-efficient operation over a desired frequency range and a conventional, direct-drive mode (cclk) to support low-frequency operation. This dual-mode feature was implemented with minimal area impact to achieve both reduced average power dissipation and improved power-constrained performance. In Piledriver, resonant clocking achieves a peak 25% global clock power reduction at 75 °C, which translates to a 4.5% reduction in average application core power.


international solid-state circuits conference | 2014

5.6 Adaptive clocking system for improved power efficiency in a 28nm x86-64 microprocessor

Aaron Grenat; Sanjay Pant; Ravinder Rachala; Samuel Naffziger

In high-performance microprocessor cores, the on-die supply voltage seen by the transistors is non-ideal and exhibits significant fluctuations. These supply fluctuations are caused by sudden changes in the current consumed by the microprocessor in response to variations in workloads. This non-ideal supply can cause performance degradation or functional failures. Therefore, a significant amount of margin (10-15%) needs to be added to the ideal voltage (if there were no AC voltage variations) to ensure that the processor always executes correctly at the committed voltage-frequency points. This excess voltage wastes power proportional to the square of the voltage increase.


international solid-state circuits conference | 2010

An x86-64 core implemented in 32nm SOI CMOS

Ravi Jotwani; Sriram Sundaram; Stephen Kosonocky; Alex Schaefer; Victor F. Andrade; Greg Constant; Amy Novak; Samuel Naffziger

The 32nm implementation of an AMD x86-64 core [1,2,5], occupies 9.69mm2, contains more than 35 million transistors (excluding L2 cache), and operates at frequencies in excess of 3GHz. The core incorporates numerous design and power improvements to enable an operating range of 2.5 to 25W and a near zero-power gated state, which makes the core well-suited to a broad range of mobile and desktop products.

Collaboration


Dive into the Samuel Naffziger's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge