Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Stephen Kosonocky is active.

Publication


Featured researches published by Stephen Kosonocky.


international solid-state circuits conference | 2010

An x86-64 core implemented in 32nm SOI CMOS

Ravi Jotwani; Sriram Sundaram; Stephen Kosonocky; Alex Schaefer; Victor F. Andrade; Greg Constant; Amy Novak; Samuel Naffziger

The 32nm implementation of an AMD x86-64 core [1,2,5], occupies 9.69mm2, contains more than 35 million transistors (excluding L2 cache), and operates at frequencies in excess of 3GHz. The core incorporates numerous design and power improvements to enable an operating range of 2.5 to 25W and a near zero-power gated state, which makes the core well-suited to a broad range of mobile and desktop products.


IEEE Journal of Solid-state Circuits | 2011

An x86-64 Core in 32 nm SOI CMOS

Ravi Jotwani; Sriram Sundaram; Stephen Kosonocky; Alex Schaefer; Victor F. Andrade; Amy Novak; Samuel Naffziger

This paper describes the 32 nm implementation of an AMD x86-64 core. It occupies 9.69 mm2, contains more than 35 million transistors (excluding L2 cache), and operates at frequencies in excess of 3 GHz. This AMD chip is fabricated in Global Foundries 32 nm SOI and uses high-K metal gate technology. The process uses dual strain liners and eSiGe (embedded Silicon Germanium) to improve performance. Transistors are fabricated in various threshold voltages and lengths to facilitate performance/leakage tradeoffs. The core incorporates numerous design and power improvements to enable an operating range of 2.5 W to 25 W and a near zero-power gated state, which makes the core well-suited to a broad range of mobile and desktop products including multicore SOC designs.


IEEE Journal of Solid-state Circuits | 2015

Steamroller Module and Adaptive Clocking System in 28 nm CMOS

Kathryn Wilcox; Robert Cole; Harry R. Fair; Kevin Gillespie; Aaron Grenat; Carson Henrion; Ravi Jotwani; Stephen Kosonocky; Benjamin Munger; Samuel Naffziger; Robert S. Orefice; Sanjay Pant; Donald A. Priore; Ravinder Rachala; Jonathan White

This work describes the physical design implementation of the AMD “Steamroller” module and adaptive clocking system that are both integral pieces of the AMD Kaveri APU SoC which was implemented using a 28 nm high-K metal gate Bulk CMOS process. The Steamroller module occupies 29.47 mm 2 and contains 236 million transistors. Various aspects of the core design are covered including the power and timing methodologies as well as design challenges moving from 32 nm SOI to 28 nm Bulk CMOS. Adaptive clocking, one of the key features used for core power efficiency, is described in detail.


international solid-state circuits conference | 2014

5.5 Steamroller: An x86-64 core implemented in 28nm bulk CMOS

Kevin Gillespie; Harry R. Fair; Carson Henrion; Ravi Jotwani; Stephen Kosonocky; Robert S. Orefice; Donald A. Priore; Jonathan White; Kathryn Wilcox

The AMD two-core x86-64 CPU module, codenamed “Steamroller”, contains 236 million transistors implemented in 28nm high-κ metal gate (HKMG) bulk CMOS using 12 levels of metal. It is designed to operate from 0.8 to 1.45V. The CPU module occupies 29.47 mm2, which includes two independent integer cores, two instruction decode units and shared instruction fetch, floating-point, and 2MB 16-way L2 cache units (Fig. 5.5.7). Along with the second instruction decode unit, this design includes a larger shared 96KB 3-way instruction cache and a 10KB L2 branch target buffer for improved single-threaded performance and multi-threaded throughput compared to a previous 32nm AMD x86-64 CPU codenamed “Bulldozer” [1].


international solid-state circuits conference | 2016

4.2 Increasing the performance of a 28nm x86-64 microprocessor through system power management

Aaron Grenat; Sriram Sundaram; Stephen Kosonocky; Ravinder Rachala; Sriram Sambamurthy; Steven Liepe; Miguel Rodriguez; Tom Burd; Adam Clark; Michael Austin; Samuel Naffziger

Power-management techniques can be effective at squeezing more performance and energy efficiency out of mature SoCs. Vmax reliability limits, infrastructure limits, guard-bands, aging, and thermal limits all put restrictions on performance. This paper describes five power-management techniques that provide a net performance increase of up to 15%, depending on the application and TDP of the SoC, on “Bristol Ridge”, a 28nm CMOS dual-core x86 APU.


international conference on vlsi design | 2016

Adaptive Voltage Frequency Scaling Using Critical Path Accumulator Implemented in 28nm CPU

Sriram Sundaram; Sriram Samabmurthy; Michael Austin; Aaron Grenat; Michael Golden; Stephen Kosonocky; Samuel Naffziger

A high bandwidth critical path accumulator (1 sample/4GHz) capable of providing accurate timing margin information is reported. We present an adaptive voltage mechanism using these critical path accumulators that improves upon existing approaches by: (1) enabling replica paths to function as a statistical sample of the full set of Fmax limiting paths resulting in improved tracking, and (2) explicit disambiguation of the voltage impact on delay from the intrinsic circuit speed by coupling path margin assessments with a voltage reading from the integrated power supply monitors. This scheme is implemented in 28nm leading generation CPU core and is shown to track Fmax accurately with a standard deviation <;2% (~1 FO2 delay) across a large range of process, voltage and temperature. Core power is reduced to the tune of 7-20% (at same performance) across various processor states.


ieee hot chips symposium | 2011

Practical power gating and dynamic voltage/frequency scaling

Stephen Kosonocky

This article consists of a collection of slides from the authors conference presentation on practical power gating and dynamic voltage and frequency scaling. Some of the specific topics discussed include: the motivation for DVFS and power gating; technology and cost trends for serving spending; dynamic voltage/frequency scaling techniques; power gating facilities; applications for use on mobile and desktop systems; the special features of the LLANO accelerated processing unit; design considerations; core voltage power capabilities; and methods for integrated voltage regulation.


international symposium on low power electronics and design | 2012

A programmable resistive power grid for post-fabrication flexibility and energy tradeoffs

Kyle Craig; Yousef Shakhsheer; Sudhanshu Khanna; Saad Arrabi; John Lach; Benton H. Calhoun; Stephen Kosonocky

This paper explores the benefits of splitting a monolithic power gate transistor into parallel, independently controlled, variable weighted power gates to provide programmable post-fabrication power grid resistance. This power gate topology creates energy saving opportunities by providing adjustable localized voltages during active modes and reducing leakage current in idle blocks while retaining data. Measurements show over 30% active energy savings per operation and 90% savings in idle current with retention. A modeling flow for a resistive power grid was also developed that demonstrates the effectiveness of this approach in a Bulldozer processor core.


international solid-state circuits conference | 2012

Power/performance optimization of many-core processor SoCs

Stephen Kosonocky; Vladimir Stojanovic; K. Van Berkel; Ming-Yang Chao; T. Knoll; Joshua Friedrich

As performance scaling per core continues to slow-down, designers are faced with a myriad of challenges in efficiently using the transistors available in modern processes. This Forum will address these next generation computing challenges in the context of highly-parallel manycore processors. The key design challenge in this manycore era is management and efficient use of resources across the layers of design hierarchy to provide power efficient high performance. System design challenges and tradeoffs will be discussed for both high performance platforms as well as mobile platforms. This will be followed by a discussion on power optimization of manycore systems, on-chip communication fabrics, system-level power managment for real-time applications, power and performance modeling of manycore systems and a discussion on physical design challenges. The forum concludes with a panel discussion providing the opportunity for participants to give feedback and ask questions.


design automation conference | 2008

Keeping hot chips cool: are IC thermal problems hot air?

Ruchir Puri; Devadas Varma; Darvin R. Edwards; Alan J. Weger; Paul D. Franzon; Andrew Yang; Stephen Kosonocky

Thermal issues are becoming more important but is the hype getting the better of the facts? Does this deserve more attention than for some niche designs and technologies such as 3D ICs.? Does the broader design community need to worry about it at 32 nm and beyond or it will only impact a small segment of designs? In short, does the severity of power issues coupled with packaging complexity translate into a thermal crisis in future? This is an educational panel with a little bit of controversy that will address the thermal issue in IC design. When will this issue be emerging as a crucial concern if at all? What are the solutions to resolve this potential crisis?

Collaboration


Dive into the Stephen Kosonocky's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Amy Novak

Advanced Micro Devices

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge