Srikanth Arekapudi
Advanced Micro Devices
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Srikanth Arekapudi.
international solid-state circuits conference | 2012
Visvesh S. Sathe; Srikanth Arekapudi; Alexander T. Ishii; Charles Ouyang; Marios C. Papaefthymiou; Samuel Naffziger
AMDs 32-nm x86-64 core code-named “Piledriver” features a resonant global clock distribution to reduce clock distribution power while maintaining a low clock skew. To support a wide range of operating frequencies expected of the core, the global clock system operates in two modes: a resonant-clock (rclk) mode for energy-efficient operation over a desired frequency range and a conventional, direct-drive mode (cclk) to support low-frequency operation. This dual-mode feature was implemented with minimal area impact to achieve both reduced average power dissipation and improved power-constrained performance. In Piledriver, resonant clocking achieves a peak 25% global clock power reduction at 75 °C, which translates to a 4.5% reduction in average application core power.
international solid-state circuits conference | 2011
Timothy Charles Fischer; Srikanth Arekapudi; Eric Busta; Carl D. Dietz; Michael Golden; Scott Hilker; Aaron K. Horiuchi; Kevin A. Hurd; Dave Johnson; Hugh McIntyre; Samuel Naffziger; James Vinh; Jonathan White; Kathryn Wilcox
AMDs 2-core “Bulldozer” module contains 213 million transistors in an 11-metal layer 32nm HKMG SOI CMOS process and is designed to operate from 0.8 to 1.3V. This new micro-architecture [1] improves performance and frequency while reducing area and power compared to a previous AMD x86–64 CPU in the same process [2]. To achieve these goals, the design reduced the number of FO4 inverter delays/cycle by more than 20%, achieving higher frequencies in the same power envelope even with increased core counts. The 2-core CPU module area (including 2MB L2 cache) is 30.9mm2 (Fig. 4.5.7).
IEEE Journal of Solid-state Circuits | 2012
Hugh McIntyre; Srikanth Arekapudi; Eric Busta; Timothy Charles Fischer; Michael Golden; Aaron K. Horiuchi; Tom Meneghini; Samuel Naffziger; James Vinh
This paper describes key circuit innovations in a new x86-64 micro-architecture AMD code-named “Bulldozer” , . It is implemented in 32 nm high-K metal gate SOI CMOS. It occupies 30.9 mm-2, contains 213 million transistors, reduces the number of F04 gates per cycle by more than 20% compared to a previous processor in the same technology , and demonstrates superior frequency scaling across voltage. The module includes two independent integer cores but shares the fetch, decode, floating-point, and L2 cache units to maximize single-threaded performance and multi-threaded throughput while significantly improving power and area efficiency compared to fully replicated CPU cores. The design includes a new soft-edged flop (SEF) family to enable high frequency and low power. Achieving power efficiency in combination with high-frequency design is a particular challenge, and this paper describes several of the unique approaches to power optimization that have been employed in the design. The gate-count reduction and power optimization enable faster frequencies in the same power envelope compared to previous designs.
international solid-state circuits conference | 2006
Michael Golden; Srikanth Arekapudi; G. Dabney; M. Haertel; S. Hale; L. Herlinger; Yongg Kim; K. McGrath; V. Palisetti; M. Singh
A microprocessor featuring 2 Hammer cores and an on-chip DDR2 memory controller implements Pacifica architectural support for virtualization. It is fabricated in a 90nm triple-Vt partially-depleted SOI process with 9 layers of copper interconnect. The chip achieves a clock frequency of 2.6GHz at 1.35V while dissipating 95W
international solid-state circuits conference | 2011
Michael Golden; Srikanth Arekapudi; James Vinh
AMDs two-core Bulldozer module implements the AMD x86-64 microarchitecture in an 11-layer 32-nm SOI HKMG technology. The 40-instruction out-of-order unified integer scheduler issues up to four operations per cycle and supports single-cycle wake-up of dependent operations. The 2.37mm2 integer execution unit supports single-cycle data bypass among four independent functional units. Compared to previous AMD x86–64 cores [3–6], project goals reduce the number of FO4 inverter delays per cycle by more than 20%, while maintaining constant IPC, to achieve higher frequency and performance in the same power envelope, even with increased core counts.
IEEE Journal of Solid-state Circuits | 2016
Benjamin Munger; David Akeson; Srikanth Arekapudi; Tom Burd; Harry R. Fair; Jim Farrell; Dave Johnson; Guhan Krishnan; Hugh McIntyre; Edward J. McLellan; Samuel Naffziger; Russell Schreiber; Sriram Sundaram; Jonathan White; Kathryn Wilcox
AMDs 6th generation “Carrizo” APU, targeted at 12-35 W mobile computing form factors, contains 3.1 billion transistors, occupies 250.04 mm 2 and is implemented in a 28 nm HKMG planar dual-oxide FET technology with 12 metal layers. The design achieves a 29% improvement in transistor density compared to the 5th generation “Kaveri” APU, also a 28 nm design, and implements several power management features resulting in area and power improvements similar to a technology shrink. Increased power density makes meeting the thermal limits required for reliability and power distribution to the APUs processors substantial design challenges. Pre-silicon thermal analysis is used to understand and take advantage of thermal gradients. Adaptive voltage-frequency scaling in the processor core as well as wordline and bitline assist techniques in the L2 cache enable lower minimum voltage requirements.
Archive | 2011
Ganesh Venkataramanan; Srikanth Arekapudi; James Vinh; Michael G. Butler
Archive | 2013
James Vinh; Srikanth Arekapudi; Kyle S. Viau
Archive | 2013
Visvesh S. Sathe; Srikanth Arekapudi; Charles Ouyang; Kyle S. Viau
Archive | 2012
Visvesh S. Sathe; Samuel Naffziger; Srikanth Arekapudi