Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Harry Muljono is active.

Publication


Featured researches published by Harry Muljono.


international solid-state circuits conference | 2009

A 45 nm 8-Core Enterprise Xeon¯ Processor

Stefan Rusu; Simon M. Tam; Harry Muljono; Jason Stinson; David Ayers; Jonathan Chang; Raj Varada; Matt Ratta; Sailesh Kottapalli; Sujal Vora

This paper describes a 2.3 Billion transistors, 8-core, 16-thread, 64-bit Xeon® EX processor with a 24 MB shared L3 cache implemented in a 45 nm nine-metal process. Multiple clock and voltage domains are used to reduce power consumption. Long channel devices and cache sleep mode are used to minimize leakage. Core and cache recovery improve manufacturing yields and enable multiple product flavors from the same silicon die. The disabled blocks are both clock and power gated to minimize their power consumption. Idle power is reduced by shutting off the unterminated I/O links and shedding phases in the voltage regulator to improve the power conversion efficiency.


international solid state circuits conference | 2007

A 65-nm Dual-Core Multithreaded Xeon® Processor With 16-MB L3 Cache

Stefan Rusu; Simon M. Tam; Harry Muljono; David Ayers; Jonathan Chang; Brian S. Cherkauer; Jason Stinson; John Benoit; Raj Varada; Justin Leung; Rahul Limaye; Sujal Vora

This paper describes a dual-core 64-b Xeon MP processor implemented in a 65-nm eight-metal process. The 435-mm2 die has 1.328-B transistors. Each core has two threads and a unified 1-MB L2 cache. The 16-MB shared, 16-way set-associative L3 cache implements both sleep and shut-off leakage reduction modes. Long channel transistors are used to reduce subthreshold leakage in cores and uncore (all portions of the die that are outside the cores) control logic. Multiple voltage and clock domains are employed to reduce power


international solid-state circuits conference | 2006

A Dual-Core Multi-Threaded Xeon Processor with 16MB L3 Cache

Stefan Rusu; Simon M. Tam; Harry Muljono; David Ayers; Jonathan Chang

A dual-core 64b Xeonreg MP processor is implemented in a 65nm 8M process. The 435mm2 die has 1.328B transistors. Each core has two threads and a unified 1MB L2 cache. The 16MB unified, 16-way set-associative L3 cache implements both sleep and shut-off leakage reduction modes


international solid-state circuits conference | 2001

Backside infrared probing for static voltage drop and dynamic timing measurements

Stefan Rusu; Steve Seidel; Gary Woods; Dean J. Grannes; Harry Muljono; Jeremy A. Rowlette; Keiko Petrosky

Due to the increased number of metal layers and flip-chip packaging, most high-performance microprocessors use optical solutions to probe internal nodes from the backside of the die. Existing probing systems use a focused infrared (1.064/spl mu/m) laser to probe internal diffusions from the backside of a chip thinned down to 100/spl mu/m. However, this optical probing setup does not provide accurate information about DC voltage levels. Also, because of the stroboscopic sampling used in laser probing, jitter measurements are difficult. This approach overcomes these limitations using alternative optical non-invasive techniques based on the infrared radiation emitted by hot electrons in saturated nMOS transistors under both static bias and switching conditions.


international symposium on microarchitecture | 2004

Itanium 2 processor 6M: higher frequency and larger L3 cache

Stefan Rusu; Harry Muljono; Brian S. Cherkauer

The third-generation Itanium processor targets the high-performance server and workstation market. To do so, the design team sought to provide higher performance through increased frequency and a larger L3 cache. At the same time, we had to limit the power dissipation to fit into the existing platform envelope. These considerations led to what we now call the Itanium 2 processor 6M: the latest generation of Itanium 2, which features a 6-Mbyte, 24-way set-associative on-die L3 cache. The design implements a 2-bundle 64-bit explicitly parallel instruction computing (EPIC) architecture and is fully compatible with previous implementations. Although this processors frequency is 50 percent higher than that of the previous generation, the maximum power dissipation holds flat at 130 W to ensure the platforms backward compatibility.In designing the next generation of the Itanium 2 processor, Intel doubled the on-die, level-three cache to 6 Mbytes and increased frequency by 50 percent compared to the previous generation. Anoth...


IEEE Journal of Solid-state Circuits | 2003

A 1.5-GHz 130-nm Itanium/sup /spl reg// 2 Processor with 6-MB on-die L3 cache

Stefan Rusu; Jason Stinson; Simon M. Tam; Justin Leung; Harry Muljono; Brian S. Cherkauer

This 130-nm Itanium 2 processor implements the explicitly parallel instruction computing (EPIC) architecture and features an on-die 6-MB 24-way set-associative level-3 cache. The 374-mm/sup 2/ die contains 410 M transistors and is implemented in a dual-V/sub t/ process with six Cu interconnect layers and FSG dielectric. The processor runs at 1.5 GHz at 1.3 V and dissipates a maximum of 130 W. This paper reviews circuit design and package details, power delivery, the reliability, availability, and serviceability (RAS) features, design for test (DFT), and design for manufacturability (DFM) features, as well as an overview of the design and verification methodology. The fuse-based clock deskew circuit achieves 24-ps skew across the entire die, while the scan-based skew control further reduces it to 7 ps. The 128-bit front-side bus has a bandwidth of 6.4 GB/s and supports up to four processors on a single bus.


international solid-state circuits conference | 2014

5.4 Ivytown: A 22nm 15-core enterprise Xeon® processor family

Stefan Rusu; Harry Muljono; David Ayers; Simon M. Tam; Wei Chen; Aaron K. Martin; Shenggao Li; Sujal Vora; Raj Varada; Eddie Wang

The next-generation enterprise Xeon® server processor has 15 dual-threaded 64b Ivybridge cores [1] and 37.5MB shared L3 cache. The system interface includes two on-chip memory controllers, each with two memory channels and supports multiple system topologies. The processor has 4.31B transistors in a high-κ metal-gate tri-gate 22nm CMOS technology with 9 metal layers [2]. The design supports a wide array of product offerings with thermal design power ranging from 40 to 150W and frequencies ranging from 1.4 to 3.8GHz. Fig. 5.4.1(a) shows the processor block diagram. The floorplan (Fig. 5.4.1(b)) is driven by the ring bus routability and latency, as well as the chop requirements to smaller core counts. The cores and associated L3 cache are organized in columns of five, with the ring bus segment embedded. The fully populated die has 15-cores in three columns. The 10-core chop removes the rightmost 3rd column and its dedicated top and bottom IOs. CMOS muxes embedded in the ring bus are programmably operable in a 2-or-3-columns configuration. The 6-core chop removes the 2nd and 4th rows from the 10-core die.


international test conference | 2004

AC IO loopback design for high speed /spl mu/processor IO test

Benoit Provost; Tiffany Huang; Chee How Lim; Kathy Tian; Mo S. Bashir; Mubeen Atha; Ali Muhtaroglu; Cangsang Zhao; Harry Muljono

This work presents the next generation AC IO loopback design for two Intel processor architectures. Both designs detect I/O defects with 20 ps resolution and 50 ps jitter for up to 800 MHz bus speed. Even though the implementations differ in some aspects to accommodate two different bus architectures, the same prudent considerations for high speed operation, minimum test inaccuracy, and low implementation costs apply to both.


IEEE Journal of Solid-state Circuits | 2003

A 400-MT/s 6.4-GB/s multiprocessor bus interface

Harry Muljono; Beomtaek Lee; Yanmei Tian; Yanbin Wang; Mubeen Atha; Tiffany Huang; Mitsuhiro Adachi; Stefan Rusu

This paper describes the design of a system bus interface for the 130-nm Itanium/sup /spl reg//2 processor that operates at 400MT/s (1 megatransfer = 1 Mb/s/pin) with a peak bandwidth of 6.4 GB/s. The high-speed operation is achieved by employing source-synchronous transfer with differential strobes. Short flight time is accomplished by double-sided placement of the processors. Preboost and postboost edge-rate control enables fast clock-to-output timing with tight edge-rate range. The built-in input/output (I/O) loopback test feature enables I/O parameters to be tested on die, using a delay-locked loop and interpolator with 21-ps phase-skew error and 15-ps rms jitter. Power modeling methodology facilitates accurate prediction of system performance.


european solid-state circuits conference | 2009

Power reduction techniques for an 8-core xeon ® processor

Stefan Rusu; Simon M. Tam; Harry Muljono; Jason Stinson; David Ayers; Jonathan Chang; Raj Varada; Matt Ratta; Sailesh Kottapalli; Sujal Vora

This paper presents the power reduction and management techniques for the 45nm, 8-core Nehalem-EX processor. Multiple clock and voltage domains are used to reduce power consumption. Long channel devices and cache sleep mode are used to minimize leakage. Core and cache recovery improve manufacturing yields and enable multiple product flavors from the same silicon die. Clock and power gating minimize power consumed by disabled blocks. An on-die microcontroller manages voltage and frequency operating points, as well as power and thermal events. Idle power is reduced by shutting off the un-terminated I/O links and shedding phases in the voltage regulator to improve the power conversion efficiency.

Collaboration


Dive into the Harry Muljono's collaboration.

Researchain Logo
Decentralizing Knowledge