Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Stefan Rusu is active.

Publication


Featured researches published by Stefan Rusu.


international solid-state circuits conference | 2000

Clock generation and distribution for the first IA-64 microprocessor

Simon M. Tam; Stefan Rusu; U. Nagarji Desai; R. Kim; Ji Zhang; I. Young

Increased functionality and performance in todays microprocessors has resulted in a trend toward larger die sizes and higher operating frequencies. These factors, coupled with larger on-die variations at reduced device geometries, call for special management of the clock distribution skew. The clock generation and distribution for the first IA-64 microprocessor achieves a low skew by using distributed programmable deskew units. Local skew control compensates for load mismatches and within-die process variations, as well as temperature and voltage gradients. In addition, this design supports debug features including on-die clock shrink and test access port (TAP) control of the deskew settings.


international solid-state circuits conference | 2009

A 45 nm 8-Core Enterprise Xeon¯ Processor

Stefan Rusu; Simon M. Tam; Harry Muljono; Jason Stinson; David Ayers; Jonathan Chang; Raj Varada; Matt Ratta; Sailesh Kottapalli; Sujal Vora

This paper describes a 2.3 Billion transistors, 8-core, 16-thread, 64-bit Xeon® EX processor with a 24 MB shared L3 cache implemented in a 45 nm nine-metal process. Multiple clock and voltage domains are used to reduce power consumption. Long channel devices and cache sleep mode are used to minimize leakage. Core and cache recovery improve manufacturing yields and enable multiple product flavors from the same silicon die. The disabled blocks are both clock and power gated to minimize their power consumption. Idle power is reduced by shutting off the unterminated I/O links and shedding phases in the voltage regulator to improve the power conversion efficiency.


international solid state circuits conference | 2007

A 65-nm Dual-Core Multithreaded Xeon® Processor With 16-MB L3 Cache

Stefan Rusu; Simon M. Tam; Harry Muljono; David Ayers; Jonathan Chang; Brian S. Cherkauer; Jason Stinson; John Benoit; Raj Varada; Justin Leung; Rahul Limaye; Sujal Vora

This paper describes a dual-core 64-b Xeon MP processor implemented in a 65-nm eight-metal process. The 435-mm2 die has 1.328-B transistors. Each core has two threads and a unified 1-MB L2 cache. The 16-MB shared, 16-way set-associative L3 cache implements both sleep and shut-off leakage reduction modes. Long channel transistors are used to reduce subthreshold leakage in cores and uncore (all portions of the die that are outside the cores) control logic. Multiple voltage and clock domains are employed to reduce power


IEEE Journal of Solid-state Circuits | 2000

The first IA-64 microprocessor

Stefan Rusu; Gadi Singer

The first implementation of the IA-64 architecture achieves high performance by using a highly parallel execution core, while maintaining binary compatibility with the IA-32 instruction set. Explicitly parallel instruction computing (EPIC) design maximizes performance through hardware and software synergy. The processor contains 25.4 million transistors and operates at 800 MHz. The chip is fabricated in a 0.18-/spl mu/m CMOS process with six metal layers and packaged in a 1012-pad organic land grid array using C4 (flip chip) assembly technology. A core speed back-side bus connects the processor to a 4-MB L3 cache.


IEEE Journal of Solid-state Circuits | 2007

The 65-nm 16-MB Shared On-Die L3 Cache for the Dual-Core Intel Xeon Processor 7100 Series

Jonathan Chang; Ming Huang; Jonathan Shoemaker; John Benoit; Szu-Liang Chen; Wei Chen; Siufu Chiu; Raghuraman Ganesan; Gloria Leong; Venkata Lukka; Stefan Rusu; Durgesh Srivastava

The 16-way set associative, single-ported 16-MB cache for the Dual-Core Intel Xeon Processor 7100 Series uses a 0.624 mum2 cell in a 65-nm 8-metal technology. Low power techniques are implemented in the L3 cache to minimize both leakage and dynamic power. Sleep transistors are used in the SRAM array and peripherals, reducing the cache leakage by more than 2X. Only 0.8% of the cache is powered up for a cache access. Dynamic cache line disable (Intel Cache Safe Technology) with a history buffer protects the cache from latent defects and infant mortality failures


international solid-state circuits conference | 2006

A Dual-Core Multi-Threaded Xeon Processor with 16MB L3 Cache

Stefan Rusu; Simon M. Tam; Harry Muljono; David Ayers; Jonathan Chang

A dual-core 64b Xeonreg MP processor is implemented in a 65nm 8M process. The 435mm2 die has 1.328B transistors. Each core has two threads and a unified 1MB L2 cache. The 16MB unified, 16-way set-associative L3 cache implements both sleep and shut-off leakage reduction modes


international solid-state circuits conference | 2001

Backside infrared probing for static voltage drop and dynamic timing measurements

Stefan Rusu; Steve Seidel; Gary Woods; Dean J. Grannes; Harry Muljono; Jeremy A. Rowlette; Keiko Petrosky

Due to the increased number of metal layers and flip-chip packaging, most high-performance microprocessors use optical solutions to probe internal nodes from the backside of the die. Existing probing systems use a focused infrared (1.064/spl mu/m) laser to probe internal diffusions from the backside of a chip thinned down to 100/spl mu/m. However, this optical probing setup does not provide accurate information about DC voltage levels. Also, because of the stroboscopic sampling used in laser probing, jitter measurements are difficult. This approach overcomes these limitations using alternative optical non-invasive techniques based on the infrared radiation emitted by hot electrons in saturated nMOS transistors under both static bias and switching conditions.


international symposium on microarchitecture | 2004

Itanium 2 processor 6M: higher frequency and larger L3 cache

Stefan Rusu; Harry Muljono; Brian S. Cherkauer

The third-generation Itanium processor targets the high-performance server and workstation market. To do so, the design team sought to provide higher performance through increased frequency and a larger L3 cache. At the same time, we had to limit the power dissipation to fit into the existing platform envelope. These considerations led to what we now call the Itanium 2 processor 6M: the latest generation of Itanium 2, which features a 6-Mbyte, 24-way set-associative on-die L3 cache. The design implements a 2-bundle 64-bit explicitly parallel instruction computing (EPIC) architecture and is fully compatible with previous implementations. Although this processors frequency is 50 percent higher than that of the previous generation, the maximum power dissipation holds flat at 130 W to ensure the platforms backward compatibility.In designing the next generation of the Itanium 2 processor, Intel doubled the on-die, level-three cache to 6 Mbytes and increased frequency by 50 percent compared to the previous generation. Anoth...


IEEE Journal of Solid-state Circuits | 2003

A 1.5-GHz 130-nm Itanium/sup /spl reg// 2 Processor with 6-MB on-die L3 cache

Stefan Rusu; Jason Stinson; Simon M. Tam; Justin Leung; Harry Muljono; Brian S. Cherkauer

This 130-nm Itanium 2 processor implements the explicitly parallel instruction computing (EPIC) architecture and features an on-die 6-MB 24-way set-associative level-3 cache. The 374-mm/sup 2/ die contains 410 M transistors and is implemented in a dual-V/sub t/ process with six Cu interconnect layers and FSG dielectric. The processor runs at 1.5 GHz at 1.3 V and dissipates a maximum of 130 W. This paper reviews circuit design and package details, power delivery, the reliability, availability, and serviceability (RAS) features, design for test (DFT), and design for manufacturability (DFM) features, as well as an overview of the design and verification methodology. The fuse-based clock deskew circuit achieves 24-ps skew across the entire die, while the scan-based skew control further reduces it to 7 ps. The 128-bit front-side bus has a bandwidth of 6.4 GB/s and supports up to four processors on a single bus.


international solid-state circuits conference | 2014

5.4 Ivytown: A 22nm 15-core enterprise Xeon® processor family

Stefan Rusu; Harry Muljono; David Ayers; Simon M. Tam; Wei Chen; Aaron K. Martin; Shenggao Li; Sujal Vora; Raj Varada; Eddie Wang

The next-generation enterprise Xeon® server processor has 15 dual-threaded 64b Ivybridge cores [1] and 37.5MB shared L3 cache. The system interface includes two on-chip memory controllers, each with two memory channels and supports multiple system topologies. The processor has 4.31B transistors in a high-κ metal-gate tri-gate 22nm CMOS technology with 9 metal layers [2]. The design supports a wide array of product offerings with thermal design power ranging from 40 to 150W and frequencies ranging from 1.4 to 3.8GHz. Fig. 5.4.1(a) shows the processor block diagram. The floorplan (Fig. 5.4.1(b)) is driven by the ring bus routability and latency, as well as the chop requirements to smaller core counts. The cores and associated L3 cache are organized in columns of five, with the ring bus segment embedded. The fully populated die has 15-cores in three columns. The 10-core chop removes the rightmost 3rd column and its dedicated top and bottom IOs. CMOS muxes embedded in the ring bus are programmably operable in a 2-or-3-columns configuration. The 6-core chop removes the 2nd and 4th rows from the 10-core die.

Collaboration


Dive into the Stefan Rusu's collaboration.

Researchain Logo
Decentralizing Knowledge