Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Srikanth Arekapudi is active.

Publication


Featured researches published by Srikanth Arekapudi.


international solid-state circuits conference | 2012

Resonant clock design for a power-efficient high-volume x86–64 microprocessor

Visvesh S. Sathe; Srikanth Arekapudi; Alexander T. Ishii; Charles Ouyang; Marios C. Papaefthymiou; Samuel Naffziger

AMDs 32-nm x86-64 core code-named “Piledriver” features a resonant global clock distribution to reduce clock distribution power while maintaining a low clock skew. To support a wide range of operating frequencies expected of the core, the global clock system operates in two modes: a resonant-clock (rclk) mode for energy-efficient operation over a desired frequency range and a conventional, direct-drive mode (cclk) to support low-frequency operation. This dual-mode feature was implemented with minimal area impact to achieve both reduced average power dissipation and improved power-constrained performance. In Piledriver, resonant clocking achieves a peak 25% global clock power reduction at 75 °C, which translates to a 4.5% reduction in average application core power.


international solid-state circuits conference | 2011

Design solutions for the Bulldozer 32nm SOI 2-core processor module in an 8-core CPU

Timothy Charles Fischer; Srikanth Arekapudi; Eric Busta; Carl D. Dietz; Michael Golden; Scott Hilker; Aaron K. Horiuchi; Kevin A. Hurd; Dave Johnson; Hugh McIntyre; Samuel Naffziger; James Vinh; Jonathan White; Kathryn Wilcox

AMDs 2-core “Bulldozer” module contains 213 million transistors in an 11-metal layer 32nm HKMG SOI CMOS process and is designed to operate from 0.8 to 1.3V. This new micro-architecture [1] improves performance and frequency while reducing area and power compared to a previous AMD x86–64 CPU in the same process [2]. To achieve these goals, the design reduced the number of FO4 inverter delays/cycle by more than 20%, achieving higher frequencies in the same power envelope even with increased core counts. The 2-core CPU module area (including 2MB L2 cache) is 30.9mm2 (Fig. 4.5.7).


IEEE Journal of Solid-state Circuits | 2012

Design of the Two-Core x86-64 AMD “Bulldozer” Module in 32 nm SOI CMOS

Hugh McIntyre; Srikanth Arekapudi; Eric Busta; Timothy Charles Fischer; Michael Golden; Aaron K. Horiuchi; Tom Meneghini; Samuel Naffziger; James Vinh

This paper describes key circuit innovations in a new x86-64 micro-architecture AMD code-named “Bulldozer” , . It is implemented in 32 nm high-K metal gate SOI CMOS. It occupies 30.9 mm-2, contains 213 million transistors, reduces the number of F04 gates per cycle by more than 20% compared to a previous processor in the same technology , and demonstrates superior frequency scaling across voltage. The module includes two independent integer cores but shares the fetch, decode, floating-point, and L2 cache units to maximize single-threaded performance and multi-threaded throughput while significantly improving power and area efficiency compared to fully replicated CPU cores. The design includes a new soft-edged flop (SEF) family to enable high frequency and low power. Achieving power efficiency in combination with high-frequency design is a particular challenge, and this paper describes several of the unique approaches to power optimization that have been employed in the design. The gate-count reduction and power optimization enable faster frequencies in the same power envelope compared to previous designs.


international solid-state circuits conference | 2006

A 2.6GHz Dual-Core 64bx86 Microprocessor with DDR2 Memory Support

Michael Golden; Srikanth Arekapudi; G. Dabney; M. Haertel; S. Hale; L. Herlinger; Yongg Kim; K. McGrath; V. Palisetti; M. Singh

A microprocessor featuring 2 Hammer cores and an on-chip DDR2 memory controller implements Pacifica architectural support for virtualization. It is fabricated in a 90nm triple-Vt partially-depleted SOI process with 9 layers of copper interconnect. The chip achieves a clock frequency of 2.6GHz at 1.35V while dissipating 95W


international solid-state circuits conference | 2011

40-Entry unified out-of-order scheduler and integer execution unit for the AMD Bulldozer x86–64 core

Michael Golden; Srikanth Arekapudi; James Vinh

AMDs two-core Bulldozer module implements the AMD x86-64 microarchitecture in an 11-layer 32-nm SOI HKMG technology. The 40-instruction out-of-order unified integer scheduler issues up to four operations per cycle and supports single-cycle wake-up of dependent operations. The 2.37mm2 integer execution unit supports single-cycle data bypass among four independent functional units. Compared to previous AMD x86–64 cores [3–6], project goals reduce the number of FO4 inverter delays per cycle by more than 20%, while maintaining constant IPC, to achieve higher frequency and performance in the same power envelope, even with increased core counts.


IEEE Journal of Solid-state Circuits | 2016

Carrizo: A High Performance, Energy Efficient 28 nm APU

Benjamin Munger; David Akeson; Srikanth Arekapudi; Tom Burd; Harry R. Fair; Jim Farrell; Dave Johnson; Guhan Krishnan; Hugh McIntyre; Edward J. McLellan; Samuel Naffziger; Russell Schreiber; Sriram Sundaram; Jonathan White; Kathryn Wilcox

AMDs 6th generation “Carrizo” APU, targeted at 12-35 W mobile computing form factors, contains 3.1 billion transistors, occupies 250.04 mm 2 and is implemented in a 28 nm HKMG planar dual-oxide FET technology with 12 metal layers. The design achieves a 29% improvement in transistor density compared to the 5th generation “Kaveri” APU, also a 28 nm design, and implements several power management features resulting in area and power improvements similar to a technology shrink. Increased power density makes meeting the thermal limits required for reliability and power distribution to the APUs processors substantial design challenges. Pre-silicon thermal analysis is used to understand and take advantage of thermal gradients. Adaptive voltage-frequency scaling in the processor core as well as wordline and bitline assist techniques in the L2 cache enable lower minimum voltage requirements.


Archive | 2011

Method and apparatus for prioritizing processor scheduler queue operations

Ganesh Venkataramanan; Srikanth Arekapudi; James Vinh; Michael G. Butler


Archive | 2013

WORD LINE LATE KILL IN SCHEDULER

James Vinh; Srikanth Arekapudi; Kyle S. Viau


Archive | 2013

TRANSITIONING BETWEEN RESONANT CLOCKING MODE AND CONVENTIONAL CLOCKING MODE

Visvesh S. Sathe; Srikanth Arekapudi; Charles Ouyang; Kyle S. Viau


Archive | 2012

CLOCK DRIVER FOR FREQUENCY-SCALABLE SYSTEMS

Visvesh S. Sathe; Samuel Naffziger; Srikanth Arekapudi

Collaboration


Dive into the Srikanth Arekapudi's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

James Vinh

Advanced Micro Devices

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge