Samuel Naffziger
Advanced Micro Devices
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Samuel Naffziger.
IEEE Journal of Solid-state Circuits | 2006
Rich Mcgowen; Christopher A. Poirier; Chris Bostak; Jim Ignowski; Mark Millican; Warren H. Parks; Samuel Naffziger
This paper describes the embedded feedback and control system on a 90-nm Itanium family processor, code-named Montecito, that maximizes performance while staying within a target power and temperature (PT) envelope. This system, referred to as Foxton Technology (FT), utilizes on-chip sensors and an embedded microcontroller to measure PT and modulate both voltage and frequency (VF) to optimize performance while meeting PT constraints. Changing both VF takes advantage of the cubic relationship of P/spl prop/CV/sup 2/F. We present measured results that show a 31% reduction in power for only a 10% drop in frequency. Montecito is able to implement FT using only 0.5% of the die area and 0.5% of the die power.
international electron devices meeting | 2005
Mark Horowitz; Elad Alon; Dinesh Patil; Samuel Naffziger; Rajesh Kumar; Kerry Bernstein
This paper briefly reviews the forces that caused the power problem, the solutions that were applied, and what the solutions tell us about the problem. As systems became more power constrained, optimizing the power became more critical; viewing power reduction from an optimization perspective provides valuable insights. Section III describes these insights in more detail, including why Vdd and Vth have stopped scaling. Section IV describes some of the low power techniques that have been used in the past in the context of the optimization framework. This framework also makes it easy to see the impact of variability, which is discussed in more detail in section V along with the adaptive mechanisms that have been proposed and deployed to minimize the energy cost. Section VI describes possible strategies for dealing with the slowdown in gate energy scaling, and the final section concludes by discussing the implications of these strategies for device designers
IEEE Journal of Solid-state Circuits | 2006
Samuel Naffziger; Blaine Stackhouse; Tom Grutkowski; Doug Josephson; Jayen Desai; Elad Alon; Mark Horowitz
The design of the high end server processor code named Montecito incorporated several ambitious goals requiring innovation. The most obvious being the incorporation of two legacy cores on-die and at the same time reducing power by 23%. This is an effective 325% increase in MIPS per watt which necessitated a holistic focus on power reduction and management. The next challenge in the implementation was to ensure robust and high frequency circuit operation in the 90-nm process generation which brings with it higher leakage and greater variability. Achieving this goal required new methodologies for design, a greatly improved and tunable clock system and a better understanding of our power grid behavior all of which required new circuits and capabilities. The final aspect of circuit design improvement involved the I/O design for our legacy multi-drop system bus. To properly feed the two high frequency cores with memory bandwidth we needed to ensure frequency headroom in the operation of the bus. This was achieved through several innovations in controllability and tuning of the I/O buffers which are discussed as well.
IEEE Journal of Solid-state Circuits | 2006
Tim Fischer; Jayen Desai; Bruce Longmont Doyle; Samuel Naffziger; Ben Patella
An Itanium Architecture microprocessor in 90-nm CMOS with 1.7B transistors implements a dynamically-variable-frequency clock system. Variable frequency clocks support a power management scheme which maximizes processor performance within a configured power envelope. Core supply voltage and clock frequency are modulated dynamically in order to remain within the power envelope. The Foxton controller and dynamically-variable clock system reside on die while the variable voltage regulator and power measurement resistors reside off chip. In addition, high-bandwidth frequency adjustment allows the clock period to adapt during on-die supply transients, allowing higher frequency processor operation during transients than possible with a single-frequency clock system.
international solid-state circuits conference | 2010
Hanh-Phuc Le; Michael D. Seeman; Seth R. Sanders; Visvesh S. Sathe; Samuel Naffziger; Elad Alon
With the rising integration levels used to increase digital processing performance, there is a clear need for multiple independent on-chip supplies in order to support per-IP or block power management. Simply adding multiple off-chip DCDC converters is not only difficult due to supply impedance concerns, but also adds cost to the platform by increasing motherboard size and package complexity. There is therefore a strong motivation to integrate voltage conversion blocks on the silicon chip.
international solid-state circuits conference | 2005
P. Mahoney; Eric S. Fetzer; B. Doyle; Samuel Naffziger
Clock distribution on the 90 nm Itanium/spl reg/ processor, code-named Montecito, is detailed. A region-based active de-skew system reduces the PVT sources of skew across the entire die during normal operation. Clock vernier devices inserted at each local clock buffer allow up to a 10% clock-cycle adjustment via firmware or scan. The system supports a constantly varying frequency and consumes <25 W from the PLL to latch while providing <10 ps of skew across PVT.
international solid-state circuits conference | 2005
Tim Fischer; Ferd Anderson; Ben Patella; Samuel Naffziger
A clock-generation system delivers fixed- and variable-frequency clocks for adaptive power control on a 1.7 B-transistor dual-core CPU. Frequency synthesizers digitally divide a fixed-frequency PLL clock in 1/64th cycle steps using programmable voltage-frequency-converter loops. 1-cycle loop response tracks supply transients with adaptive modulation, improving CPU performance by over 10% compared to a fixed-frequency design.
international solid-state circuits conference | 2012
Visvesh S. Sathe; Srikanth Arekapudi; Alexander T. Ishii; Charles Ouyang; Marios C. Papaefthymiou; Samuel Naffziger
AMDs 32-nm x86-64 core code-named “Piledriver” features a resonant global clock distribution to reduce clock distribution power while maintaining a low clock skew. To support a wide range of operating frequencies expected of the core, the global clock system operates in two modes: a resonant-clock (rclk) mode for energy-efficient operation over a desired frequency range and a conventional, direct-drive mode (cclk) to support low-frequency operation. This dual-mode feature was implemented with minimal area impact to achieve both reduced average power dissipation and improved power-constrained performance. In Piledriver, resonant clocking achieves a peak 25% global clock power reduction at 75 °C, which translates to a 4.5% reduction in average application core power.
international solid-state circuits conference | 2014
Aaron Grenat; Sanjay Pant; Ravinder Rachala; Samuel Naffziger
In high-performance microprocessor cores, the on-die supply voltage seen by the transistors is non-ideal and exhibits significant fluctuations. These supply fluctuations are caused by sudden changes in the current consumed by the microprocessor in response to variations in workloads. This non-ideal supply can cause performance degradation or functional failures. Therefore, a significant amount of margin (10-15%) needs to be added to the ideal voltage (if there were no AC voltage variations) to ensure that the processor always executes correctly at the committed voltage-frequency points. This excess voltage wastes power proportional to the square of the voltage increase.
international solid-state circuits conference | 2010
Ravi Jotwani; Sriram Sundaram; Stephen Kosonocky; Alex Schaefer; Victor F. Andrade; Greg Constant; Amy Novak; Samuel Naffziger
The 32nm implementation of an AMD x86-64 core [1,2,5], occupies 9.69mm2, contains more than 35 million transistors (excluding L2 cache), and operates at frequencies in excess of 3GHz. The core incorporates numerous design and power improvements to enable an operating range of 2.5 to 25W and a near zero-power gated state, which makes the core well-suited to a broad range of mobile and desktop products.