Brian S. Cherkauer
Intel
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Brian S. Cherkauer.
international solid-state circuits conference | 2008
Blaine Stackhouse; Brian S. Cherkauer; Michael K. Gowan; Paul E. Gronowski; Chris Lyles
The next generation in the Itanium processor family is introduced. The processor has 4 dual-threaded cores integrated on die with a system interface and 30MB of cache in an 8M 65nm process. The 21.5times32.5mm2 die contains 2.05 billion transistors. The silicon is designed to operate the cores up to 2.0GHz and the system interface at 2.4GHz, with a thermal design power (TDP) of 170W at 110degC.
IEEE Transactions on Very Large Scale Integration Systems | 1995
Brian S. Cherkauer; Eby G. Friedman
In this paper, the various disparate approaches to CMOS tapered buffer design are unified into an integrated design methodology. Circuit speed, power dissipation, physical area, and system reliability are the four performance criteria of concern in tapered buffers, and each places a separate, often conflicting, constraint on the design of a tapered buffer. Enhanced short-channel tapered buffer design equations are presented for propagation delay and power dissipation, as well as a new split-capacitor model of hot-carrier reliability of tapered buffers and a two-component physical area model. Each performance criterion is individually investigated and analyzed with respect to the number of stages and tapering factor, and the interaction of the four criteria is examined to develop both a qualitative and a quantitative understanding of the various design tradeoffs. The creation of process dependent look-up tables for optimal buffer design is described, and a methodology to apply these look-up tables to application-specific tapered buffers for both unconstrained and constrained systems is developed. >
international solid state circuits conference | 2007
Stefan Rusu; Simon M. Tam; Harry Muljono; David Ayers; Jonathan Chang; Brian S. Cherkauer; Jason Stinson; John Benoit; Raj Varada; Justin Leung; Rahul Limaye; Sujal Vora
This paper describes a dual-core 64-b Xeon MP processor implemented in a 65-nm eight-metal process. The 435-mm2 die has 1.328-B transistors. Each core has two threads and a unified 1-MB L2 cache. The 16-MB shared, 16-way set-associative L3 cache implements both sleep and shut-off leakage reduction modes. Long channel transistors are used to reduce subthreshold leakage in cores and uncore (all portions of the die that are outside the cores) control logic. Multiple voltage and clock domains are employed to reduce power
IEEE Journal of Solid-state Circuits | 2009
Blaine Stackhouse; Sal Bhimji; Chris Bostak; Dave Bradley; Brian S. Cherkauer; Jayen Desai; Erin Francom; Mike Gowan; Paul E. Gronowski; Dan Krueger; Charles Morganti; Steve Troyer
This paper describes an Itanium processor implemented in 65 nm process with 8 layers of Cu interconnect. The 21.5 mm by 32.5 mm die has 2.05B transistors. The processor has four dual-threaded cores, 30 MB of cache, and a system interface that operates at 2.4 GHz at 105degC . High speed serial interconnects allow for peak processor-to-processor bandwidth of 96 GB/s and peak memory bandwidth of 34 GB/s.
IEEE Transactions on Circuits and Systems Ii: Analog and Digital Signal Processing | 1997
Brian S. Cherkauer; Eby G. Friedman
A hybrid radix-4/radix-8 architecture targeted for high bit, general purpose, digital multipliers is presented as a compromise between the high speed of a radix-4 multiplier architecture and the low power dissipation of a radix-8 multiplier architecture. In this hybrid radix-4/radix-8 multiplier architecture, the performance bottleneck of a radix-8 multiplier, the generation of three times the multiplicand for use in generating the radix-8 partial product, is performed in parallel with the reduction of the radix-4 partial products rather than serially, as in a radix-8 multiplier. This hybrid radix-4/radix-8 multiplier architecture requires 13% less power for a 64/spl times/64-b multiplier, and results in only a 9% increase in delay, as compared with a radix-4 implementation. When the voltage supply is scaled to equalize delay, the 64/spl times/64-b hybrid multiplier dissipates less power than either the radix-4 or radix-8 multipliers. The hybrid radix-4/radix-8 architecture is therefore appropriate for those applications that must dissipate minimal power while operating at high speeds.
IEEE Journal of Solid-state Circuits | 1995
Brian S. Cherkauer; Eby G. Friedman
This paper presents a design methodology and analytic relationships for the optimal tapering of cascaded buffers which consider the effects of local interconnect capacitance. The method, constant capacitance-to-current ratio tapering (C/sup 3/RT), is based on maintaining the capacitive load to current drive ratio constant, and therefore, the propagation delay of each buffer stage also remains constant. Reductions in power dissipation of up to 22% and reductions in active area of up to 46%, coupled with reductions in propagation delay of up to 2%, as compared with tapered buffers which neglect local interconnect capacitance, are exhibited for an example buffer system. >
international symposium on microarchitecture | 2004
Stefan Rusu; Harry Muljono; Brian S. Cherkauer
The third-generation Itanium processor targets the high-performance server and workstation market. To do so, the design team sought to provide higher performance through increased frequency and a larger L3 cache. At the same time, we had to limit the power dissipation to fit into the existing platform envelope. These considerations led to what we now call the Itanium 2 processor 6M: the latest generation of Itanium 2, which features a 6-Mbyte, 24-way set-associative on-die L3 cache. The design implements a 2-bundle 64-bit explicitly parallel instruction computing (EPIC) architecture and is fully compatible with previous implementations. Although this processors frequency is 50 percent higher than that of the previous generation, the maximum power dissipation holds flat at 130 W to ensure the platforms backward compatibility.In designing the next generation of the Itanium 2 processor, Intel doubled the on-die, level-three cache to 6 Mbytes and increased frequency by 50 percent compared to the previous generation. Anoth...
IEEE Journal of Solid-state Circuits | 2003
Stefan Rusu; Jason Stinson; Simon M. Tam; Justin Leung; Harry Muljono; Brian S. Cherkauer
This 130-nm Itanium 2 processor implements the explicitly parallel instruction computing (EPIC) architecture and features an on-die 6-MB 24-way set-associative level-3 cache. The 374-mm/sup 2/ die contains 410 M transistors and is implemented in a dual-V/sub t/ process with six Cu interconnect layers and FSG dielectric. The processor runs at 1.5 GHz at 1.3 V and dissipates a maximum of 130 W. This paper reviews circuit design and package details, power delivery, the reliability, availability, and serviceability (RAS) features, design for test (DFT), and design for manufacturability (DFM) features, as well as an overview of the design and verification methodology. The fuse-based clock deskew circuit achieves 24-ps skew across the entire die, while the scan-based skew control further reduces it to 7 ps. The 128-bit front-side bus has a bandwidth of 6.4 GB/s and supports up to four processors on a single bus.
IEEE Transactions on Very Large Scale Integration Systems | 1994
Brian S. Cherkauer; Eby G. Friedman
Transistor channel width tapering in serial MOSFET chains is shown in this paper to simultaneously decrease propagation delay, power dissipation, and physical area of VLSI circuits. Tapering is the process of decreasing the size of each MOSFET transistor width along a serial chain such that the largest transistor is connected to the power supply and the smallest is connected to the output node. A detailed explanation of the effects of tapering on the output waveform is presented with specific emphasis on the power dissipation of tapered chains. It is demonstrated that in many cases tapering decreases delay and changes the shape of the output waveform such that the time during which a load inverter is conducting short-circuit current is reduced. This decrease in short-circuit current also occurs in many cases where tapering may not offer a speed advantage. In addition, dynamic CV/sup 2/f power dissipation of the serial chain is reduced. In those circuits where tapering does not decrease propagation delay, tapering permits a designer to tradeoff speed for a reduction in both short-circuit and dynamic power dissipation, a tradeoff not normally available with untapered chains. Thus the total power consumed by a serial chain of MOSFETs, as well as its propagation delay and area, can be reduced by channel width tapering. A design system for determining when tapering is appropriate, selecting the amount of tapering, and synthesizing the physical layout is presented. Physical layout issues unique to tapering are discussed, and fabricated test structures are described. >
international symposium on circuits and systems | 1996
Brian S. Cherkauer; Eby G. Friedman
A hybrid radix-4/radix-8 architecture targeted for high bit multipliers is presented as a compromise between the high speed of a radix-4 multiplier architecture and the low power dissipation of a radix-8 multiplier architecture. In this hybrid radix-4/radix-8 multiplier architecture, the performance bottleneck of a radix-8 multiplier, the generation of three times the multiplicand for use in generating the radix-8 partial product, is performed in parallel with the reduction of the radix-4 partial products rather than serially, as in a radix-8 multiplier. This hybrid radix-4/radix-8 multiplier architecture requires 13% less power for a 64/spl times/64 bit multiplier, and results in only a 9% increase in delay, as compared with a radix-4 implementation. When supply voltage is scaled such that all multipliers exhibit the same delay, the 64/spl times/64 bit hybrid radix-4/radix-8 multiplier dissipates less power than either the radix-4 or radix-8 multipliers. The hybrid radix-4/radix-8 architecture is therefore appropriate for those applications that must dissipate minimal power and operate at high speeds.