Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where David Ayers is active.

Publication


Featured researches published by David Ayers.


international solid-state circuits conference | 2009

A 45 nm 8-Core Enterprise Xeon¯ Processor

Stefan Rusu; Simon M. Tam; Harry Muljono; Jason Stinson; David Ayers; Jonathan Chang; Raj Varada; Matt Ratta; Sailesh Kottapalli; Sujal Vora

This paper describes a 2.3 Billion transistors, 8-core, 16-thread, 64-bit Xeon® EX processor with a 24 MB shared L3 cache implemented in a 45 nm nine-metal process. Multiple clock and voltage domains are used to reduce power consumption. Long channel devices and cache sleep mode are used to minimize leakage. Core and cache recovery improve manufacturing yields and enable multiple product flavors from the same silicon die. The disabled blocks are both clock and power gated to minimize their power consumption. Idle power is reduced by shutting off the unterminated I/O links and shedding phases in the voltage regulator to improve the power conversion efficiency.


international solid state circuits conference | 2007

A 65-nm Dual-Core Multithreaded Xeon® Processor With 16-MB L3 Cache

Stefan Rusu; Simon M. Tam; Harry Muljono; David Ayers; Jonathan Chang; Brian S. Cherkauer; Jason Stinson; John Benoit; Raj Varada; Justin Leung; Rahul Limaye; Sujal Vora

This paper describes a dual-core 64-b Xeon MP processor implemented in a 65-nm eight-metal process. The 435-mm2 die has 1.328-B transistors. Each core has two threads and a unified 1-MB L2 cache. The 16-MB shared, 16-way set-associative L3 cache implements both sleep and shut-off leakage reduction modes. Long channel transistors are used to reduce subthreshold leakage in cores and uncore (all portions of the die that are outside the cores) control logic. Multiple voltage and clock domains are employed to reduce power


high-performance computer architecture | 2002

Microarchitectural simulation and control of di/dt-induced power supply voltage variation

Ed Grochowski; David Ayers; Vivek Tiwari

As the power consumption of modern high-performance microprocessors increases beyond 100 W, power becomes an increasingly important design consideration. This paper presents a novel technique to simulate power supply voltage variation as a result of varying activity levels within the microprocessor when executing typical software. The voltage simulation capability may be added to existing microarchitecture simulators that determine the activities of each functional block on a clock-by-clock basis. We then discuss how the same technique can be implemented in logic on the microprocessor die to enable real-time computation of current consumption and power supply voltage. When used in a feedback loop, this logic makes it possible to control the microprocessors activities to reduce demands on the power delivery system. With on-die voltage computation and di/dt control, we show that a significant reduction in power supply voltage variation may be achieved with little performance loss or average power increase.


international solid-state circuits conference | 2006

A Dual-Core Multi-Threaded Xeon Processor with 16MB L3 Cache

Stefan Rusu; Simon M. Tam; Harry Muljono; David Ayers; Jonathan Chang

A dual-core 64b Xeonreg MP processor is implemented in a 65nm 8M process. The 435mm2 die has 1.328B transistors. Each core has two threads and a unified 1MB L2 cache. The 16MB unified, 16-way set-associative L3 cache implements both sleep and shut-off leakage reduction modes


international solid-state circuits conference | 2014

5.4 Ivytown: A 22nm 15-core enterprise Xeon® processor family

Stefan Rusu; Harry Muljono; David Ayers; Simon M. Tam; Wei Chen; Aaron K. Martin; Shenggao Li; Sujal Vora; Raj Varada; Eddie Wang

The next-generation enterprise Xeon® server processor has 15 dual-threaded 64b Ivybridge cores [1] and 37.5MB shared L3 cache. The system interface includes two on-chip memory controllers, each with two memory channels and supports multiple system topologies. The processor has 4.31B transistors in a high-κ metal-gate tri-gate 22nm CMOS technology with 9 metal layers [2]. The design supports a wide array of product offerings with thermal design power ranging from 40 to 150W and frequencies ranging from 1.4 to 3.8GHz. Fig. 5.4.1(a) shows the processor block diagram. The floorplan (Fig. 5.4.1(b)) is driven by the ring bus routability and latency, as well as the chop requirements to smaller core counts. The cores and associated L3 cache are organized in columns of five, with the ring bus segment embedded. The fully populated die has 15-cores in three columns. The 10-core chop removes the rightmost 3rd column and its dedicated top and bottom IOs. CMOS muxes embedded in the ring bus are programmably operable in a 2-or-3-columns configuration. The 6-core chop removes the 2nd and 4th rows from the 10-core die.


european solid-state circuits conference | 2009

Power reduction techniques for an 8-core xeon ® processor

Stefan Rusu; Simon M. Tam; Harry Muljono; Jason Stinson; David Ayers; Jonathan Chang; Raj Varada; Matt Ratta; Sailesh Kottapalli; Sujal Vora

This paper presents the power reduction and management techniques for the 45nm, 8-core Nehalem-EX processor. Multiple clock and voltage domains are used to reduce power consumption. Long channel devices and cache sleep mode are used to minimize leakage. Core and cache recovery improve manufacturing yields and enable multiple product flavors from the same silicon die. Clock and power gating minimize power consumed by disabled blocks. An on-die microcontroller manages voltage and frequency operating points, as well as power and thermal events. Idle power is reduced by shutting off the un-terminated I/O links and shedding phases in the voltage regulator to improve the power conversion efficiency.


asian solid state circuits conference | 2006

A 65nm 95W Dual-Core Multi-Threaded Xeon® Processor with L3 Cache

Simon M. Tam; Stefan Rusu; Jonathan Chang; Sujal Vora; Brian S. Cherkauer; David Ayers

This paper describes a 95 W dual-core 64-bit Xeonreg MP processor implemented in a 65 nm 8 metal layer process. Each processor core has a unified 1MB L2 cache and supports the Intelreg Extended Memory 64 Technology and the Hyper-Threading Technology. The shared L3 cache has extensive RAS features including the Intelreg Cache Safe Technology and Error Correction Codes (ECC). The processor is designed and optimized to operate at a 95W thermal design power envelope at the target product frequency. The front-side bus operates at 667 MT/s or 800 MT/s in a 3 load topology that is compatible with existing platforms.


asian solid state circuits conference | 2009

A 45nm 8-core enterprise Xeon ® processor

Stefan Rusu; Simon M. Tam; Harry Muljono; David Ayers; Jonathan Chang; Raj Varada; Matt Ratta; Sujal Vora

A 2.3B transistors, 8-core, 16-thread 64-bit Xeon® EX processor with a 24MB shared L3 cache was implemented in a 45nm 9-metal process. Multiple clock and voltage domains are employed to reduce power consumption. Long channel devices and cache sleep mode are used to minimize leakage. Core and cache recovery improve manufacturing yields and enable multiple product flavors using the same silicon die and package. The disabled blocks are both clock and power gated to minimize their power consumption. Idle power is reduced by shutting off the un-terminated I/O links and shedding phases in the voltage regulator to improve the power conversion efficiency.


IEEE Design & Test of Computers | 2003

Microarchitectural dl/dt control

Ed Grochowski; David Ayers; Vivek Tiwari

This article takes a high level of the power-grid noise problem as it relates to the microarchitectural definition of an IC. Through a set of simulations, the authors relate the noise problem to the details of the circuit and clocking implementation giving insight into the possible method to reduce such noise.


IEEE Journal of Solid-state Circuits | 2015

A 22 nm 15-Core Enterprise Xeon® Processor Family

Stefan Rusu; Harry Muljono; David Ayers; Simon M. Tam; Wei Chen; Aaron K. Martin; Shenggao Li; Sujal Vora; Raj Varada; Eddie Wang

This paper describes a 4.3B transistors, 15-cores, 30-threads enterprise Xeon® processor with a 37.5 MB shared L3 cache implemented in a 22 nm 9M Hi-K metal gate tri-gate process. A modular floorplan methodology enables easy chops to 10 and 6 cores. Multiple clock and voltage domains are used to reduce power consumption. The clock distribution uses a single PLL per column to save power and minimize deskew crossing points. Integrated PCIe Gen3 and Quick Path Interconnect® (QPI) ports operate at 8GT/s. The 4-channel memory interface supports both 1866 MT/s DDR3 and a new memory buffer interface running at 2667 MT/s on the same pins. The core, cache and I/O recovery techniques improve manufacturing yields and enable multiple product flavors from the same silicon die.

Researchain Logo
Decentralizing Knowledge