Stephen Kosonocky
Advanced Micro Devices
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Stephen Kosonocky.
international solid-state circuits conference | 2010
Ravi Jotwani; Sriram Sundaram; Stephen Kosonocky; Alex Schaefer; Victor F. Andrade; Greg Constant; Amy Novak; Samuel Naffziger
The 32nm implementation of an AMD x86-64 core [1,2,5], occupies 9.69mm2, contains more than 35 million transistors (excluding L2 cache), and operates at frequencies in excess of 3GHz. The core incorporates numerous design and power improvements to enable an operating range of 2.5 to 25W and a near zero-power gated state, which makes the core well-suited to a broad range of mobile and desktop products.
IEEE Journal of Solid-state Circuits | 2011
Ravi Jotwani; Sriram Sundaram; Stephen Kosonocky; Alex Schaefer; Victor F. Andrade; Amy Novak; Samuel Naffziger
This paper describes the 32 nm implementation of an AMD x86-64 core. It occupies 9.69 mm2, contains more than 35 million transistors (excluding L2 cache), and operates at frequencies in excess of 3 GHz. This AMD chip is fabricated in Global Foundries 32 nm SOI and uses high-K metal gate technology. The process uses dual strain liners and eSiGe (embedded Silicon Germanium) to improve performance. Transistors are fabricated in various threshold voltages and lengths to facilitate performance/leakage tradeoffs. The core incorporates numerous design and power improvements to enable an operating range of 2.5 W to 25 W and a near zero-power gated state, which makes the core well-suited to a broad range of mobile and desktop products including multicore SOC designs.
IEEE Journal of Solid-state Circuits | 2015
Kathryn Wilcox; Robert Cole; Harry R. Fair; Kevin Gillespie; Aaron Grenat; Carson Henrion; Ravi Jotwani; Stephen Kosonocky; Benjamin Munger; Samuel Naffziger; Robert S. Orefice; Sanjay Pant; Donald A. Priore; Ravinder Rachala; Jonathan White
This work describes the physical design implementation of the AMD “Steamroller” module and adaptive clocking system that are both integral pieces of the AMD Kaveri APU SoC which was implemented using a 28 nm high-K metal gate Bulk CMOS process. The Steamroller module occupies 29.47 mm 2 and contains 236 million transistors. Various aspects of the core design are covered including the power and timing methodologies as well as design challenges moving from 32 nm SOI to 28 nm Bulk CMOS. Adaptive clocking, one of the key features used for core power efficiency, is described in detail.
international solid-state circuits conference | 2014
Kevin Gillespie; Harry R. Fair; Carson Henrion; Ravi Jotwani; Stephen Kosonocky; Robert S. Orefice; Donald A. Priore; Jonathan White; Kathryn Wilcox
The AMD two-core x86-64 CPU module, codenamed “Steamroller”, contains 236 million transistors implemented in 28nm high-κ metal gate (HKMG) bulk CMOS using 12 levels of metal. It is designed to operate from 0.8 to 1.45V. The CPU module occupies 29.47 mm2, which includes two independent integer cores, two instruction decode units and shared instruction fetch, floating-point, and 2MB 16-way L2 cache units (Fig. 5.5.7). Along with the second instruction decode unit, this design includes a larger shared 96KB 3-way instruction cache and a 10KB L2 branch target buffer for improved single-threaded performance and multi-threaded throughput compared to a previous 32nm AMD x86-64 CPU codenamed “Bulldozer” [1].
international solid-state circuits conference | 2016
Aaron Grenat; Sriram Sundaram; Stephen Kosonocky; Ravinder Rachala; Sriram Sambamurthy; Steven Liepe; Miguel Rodriguez; Tom Burd; Adam Clark; Michael Austin; Samuel Naffziger
Power-management techniques can be effective at squeezing more performance and energy efficiency out of mature SoCs. Vmax reliability limits, infrastructure limits, guard-bands, aging, and thermal limits all put restrictions on performance. This paper describes five power-management techniques that provide a net performance increase of up to 15%, depending on the application and TDP of the SoC, on “Bristol Ridge”, a 28nm CMOS dual-core x86 APU.
international conference on vlsi design | 2016
Sriram Sundaram; Sriram Samabmurthy; Michael Austin; Aaron Grenat; Michael Golden; Stephen Kosonocky; Samuel Naffziger
A high bandwidth critical path accumulator (1 sample/4GHz) capable of providing accurate timing margin information is reported. We present an adaptive voltage mechanism using these critical path accumulators that improves upon existing approaches by: (1) enabling replica paths to function as a statistical sample of the full set of Fmax limiting paths resulting in improved tracking, and (2) explicit disambiguation of the voltage impact on delay from the intrinsic circuit speed by coupling path margin assessments with a voltage reading from the integrated power supply monitors. This scheme is implemented in 28nm leading generation CPU core and is shown to track Fmax accurately with a standard deviation <;2% (~1 FO2 delay) across a large range of process, voltage and temperature. Core power is reduced to the tune of 7-20% (at same performance) across various processor states.
ieee hot chips symposium | 2011
Stephen Kosonocky
This article consists of a collection of slides from the authors conference presentation on practical power gating and dynamic voltage and frequency scaling. Some of the specific topics discussed include: the motivation for DVFS and power gating; technology and cost trends for serving spending; dynamic voltage/frequency scaling techniques; power gating facilities; applications for use on mobile and desktop systems; the special features of the LLANO accelerated processing unit; design considerations; core voltage power capabilities; and methods for integrated voltage regulation.
international symposium on low power electronics and design | 2012
Kyle Craig; Yousef Shakhsheer; Sudhanshu Khanna; Saad Arrabi; John Lach; Benton H. Calhoun; Stephen Kosonocky
This paper explores the benefits of splitting a monolithic power gate transistor into parallel, independently controlled, variable weighted power gates to provide programmable post-fabrication power grid resistance. This power gate topology creates energy saving opportunities by providing adjustable localized voltages during active modes and reducing leakage current in idle blocks while retaining data. Measurements show over 30% active energy savings per operation and 90% savings in idle current with retention. A modeling flow for a resistive power grid was also developed that demonstrates the effectiveness of this approach in a Bulldozer processor core.
international solid-state circuits conference | 2012
Stephen Kosonocky; Vladimir Stojanovic; K. Van Berkel; Ming-Yang Chao; T. Knoll; Joshua Friedrich
As performance scaling per core continues to slow-down, designers are faced with a myriad of challenges in efficiently using the transistors available in modern processes. This Forum will address these next generation computing challenges in the context of highly-parallel manycore processors. The key design challenge in this manycore era is management and efficient use of resources across the layers of design hierarchy to provide power efficient high performance. System design challenges and tradeoffs will be discussed for both high performance platforms as well as mobile platforms. This will be followed by a discussion on power optimization of manycore systems, on-chip communication fabrics, system-level power managment for real-time applications, power and performance modeling of manycore systems and a discussion on physical design challenges. The forum concludes with a panel discussion providing the opportunity for participants to give feedback and ask questions.
design automation conference | 2008
Ruchir Puri; Devadas Varma; Darvin R. Edwards; Alan J. Weger; Paul D. Franzon; Andrew Yang; Stephen Kosonocky
Thermal issues are becoming more important but is the hype getting the better of the facts? Does this deserve more attention than for some niche designs and technologies such as 3D ICs.? Does the broader design community need to worry about it at 32 nm and beyond or it will only impact a small segment of designs? In short, does the severity of power issues coupled with packaging complexity translate into a thermal crisis in future? This is an educational panel with a little bit of controversy that will address the thermal issue in IC design. When will this issue be emerging as a crucial concern if at all? What are the solutions to resolve this potential crisis?