Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Shekhar Borkar is active.

Publication


Featured researches published by Shekhar Borkar.


design automation conference | 2003

Parameter variations and impact on circuits and microarchitecture

Shekhar Borkar; Tanay Karnik; Siva G. Narendra; James W. Tschanz; Ali Keshavarzi; Vivek De

Parameter variation in scaled technologies beyond 90nm will pose a major challenge for design of future high performance microprocessors. In this paper, we discuss process, voltage and temperature variations; and their impact on circuit and microarchitecture. Possible solutions to reduce the impact of parameter variations and to achieve higher frequency bins are also presented.


international symposium on microarchitecture | 2005

Designing reliable systems from unreliable components: the challenges of transistor variability and degradation

Shekhar Borkar

As technology scales, variability in transistor performance continues to increase, making transistors less and less reliable. This creates several challenges in building reliable systems, from the unpredictability of delay to increasing leakage current. Finding solutions to these challenges require a concerted effort on the part of all the players in a system design. This article discusses these effects and proposes microarchitecture, circuit, and testing research that focuses on designing with many unreliable components (transistors) to yield reliable system designs.


international symposium on microarchitecture | 1999

Design challenges of technology scaling

Shekhar Borkar

Scaling advanced CMOS technology to the next generation improves performance, increases transistor density, and reduces power consumption. Technology scaling typically has three main goals: 1) reduce gate delay by 30%, resulting in an increase in operating frequency of about 43%; 2) double transistor density; and 3) reduce energy per transition by about 65%, saving 50% of power (at a 43% increase in frequency). These are not ad hoc goals; rather, they follow scaling theory. This article looks closely at past trends in technology scaling and how well microprocessor technology and products have met these goals. It also projects the challenges that lie ahead if these trends continue. This analysis uses data from various Intel microprocessors; however, this study is equally applicable to other types of logic designs. Is process technology meeting the goals predicted by scaling theory? An analysis of microprocessor performance, transistor density, and power trends through successive technology generations helps identify potential limiters of scaling, performance, and integration.


design automation conference | 2007

Thousand core chips: a technology perspective

Shekhar Borkar

This paper presents the many-core architecture, with hundreds to thousands of small cores, to deliver unprecedented compute performance in an affordable power envelope. We discuss fine grain power management, memory bandwidth, on die networks, and system resiliency for the many-core system.


Communications of The ACM | 2011

The future of microprocessors

Shekhar Borkar; Andrew A. Chien

Energy efficiency is the new fundamental limiter of processor performance, way beyond numbers of processors.


IEEE Journal of Solid-state Circuits | 2008

An 80-Tile Sub-100-W TeraFLOPS Processor in 65-nm CMOS

Sriram R. Vangal; Jason Howard; Gregory Ruhl; Saurabh Dighe; Howard Wilson; James W. Tschanz; David Finan; Arvind Singh; Tiju Jacob; Shailendra Jain; Vasantha Erraguntla; Clark Roberts; Yatin Hoskote; Nitin Borkar; Shekhar Borkar

This paper describes an integrated network-on-chip architecture containing 80 tiles arranged as an 8x10 2-D array of floating-point cores and packet-switched routers, both designed to operate at 4 GHz. Each tile has two pipelined single-precision floating-point multiply accumulators (FPMAC) which feature a single-cycle accumulation loop for high throughput. The on-chip 2-D mesh network provides a bisection bandwidth of 2 Terabits/s. The 15-FO4 design employs mesochronous clocking, fine-grained clock gating, dynamic sleep transistors, and body-bias techniques. In a 65-nm eight-metal CMOS process, the 275 mm2 custom design contains 100 M transistors. The fully functional first silicon achieves over 1.0 TFLOPS of performance on a range of benchmarks while dissipating 97 W at 4.27 GHz and 1.07 V supply.


international solid-state circuits conference | 2010

A 48-Core IA-32 message-passing processor with DVFS in 45nm CMOS

Jason Howard; Saurabh Dighe; Yatin Hoskote; Sriram R. Vangal; David Finan; Gregory Ruhl; David Jenkins; Howard Wilson; Nitin Borkar; Gerhard Schrom; Fabric Pailet; Shailendra Jain; Tiju Jacob; Satish Yada; Sraven Marella; Praveen Salihundam; Vasantha Erraguntla; Michael Konow; Michael Riepen; Guido Droege; Joerg Lindemann; Matthias Gries; Thomas Apel; Kersten Henriss; Tor Lund-Larsen; Sebastian Steibl; Shekhar Borkar; Vivek De; Rob F. Van der Wijngaart; Timothy G. Mattson

Current developments in microprocessor design favor increased core counts over frequency scaling to improve processor performance and energy efficiency. Coupling this architectural trend with a message-passing protocol helps realize a data-center-on-a-die. The prototype chip (Figs. 5.7.1 and 5.7.7) described in this paper integrates 48 Pentium™ class IA-32 cores [1] on a 6×4 2D-mesh network of tiled core clusters with high-speed I/Os on the periphery. The chip contains 1.3B transistors. Each core has a private 256KB L2 cache (12MB total on-die) and is optimized to support a message-passing-programming model whereby cores communicate through shared memory. A 16KB message-passing buffer (MPB) is present in every tile, giving a total of 384KB on-die shared memory, for increased performance. Power is kept at a minimum by transmitting dynamic, fine-grained voltage-change commands over the network to an on-die voltage-regulator controller (VRC). Further power savings are achieved through active frequency scaling at the tile granularity. Memory accesses are distributed over four on-die DDR3 controllers for an aggregate peak memory bandwidth of 21GB/s at 4× burst. Additionally, an 8-byte bidirectional system interface (SIF) provides 6.4GB/s of I/O bandwidth. The die area is 567mm2 and is implemented in 45nm high-к metal-gate CMOS [2].


international symposium on microarchitecture | 2007

A 5-GHz Mesh Interconnect for a Teraflops Processor

Yatin Hoskote; Sriram R. Vangal; Arvind Singh; Nitin Borkar; Shekhar Borkar

A multicore processor in 65-Nm technology with 80 single-precision, floatingpoint cores delivers performance in excess of a Teraflops while consuming less than 100 W. A 2D on-die mesh interconnection network operating at 5 GHz provides the high-performance communication fabric to connect the cores. The network delivers a bisection bandwidth of 2.56 Terabits per second and a per hop fall-through latency of 1 nanosecond.


IEEE Journal of Solid-state Circuits | 2005

Area-efficient linear regulator with ultra-fast load regulation

Peter Hazucha; Tanay Karnik; Bradley Bloechel; Colleen Parsons; David Finan; Shekhar Borkar

We demonstrate a fully integrated linear regulator for multisupply voltage microprocessors implemented in a 90 nm CMOS technology. Ultra-fast single-stage load regulation achieves a 0.54-ns response time at 94% current efficiency. For a 1.2-V input voltage and 0.9-V output voltage the regulator enables a 90 mV/sub P-P/ output droop for a 100-mA load step with only a small on-chip decoupling capacitor of 0.6 nF. By using a PMOS pull-up transistor in the output stage we achieved a small regulator area of 0.008 mm/sup 2/ and a minimum dropout voltage of 0.2 V for 100 mA of output current. The area for the 0.6-nF MOS capacitor is 0.090 mm/sup 2/.


conference on high performance computing (supercomputing) | 1988

iWarp: an integrated solution to high-speed parallel computing

Shekhar Borkar; Robert Cohn; George W. Cox; Sha Gleason; Thomas Gross; H. T. Kung; Monica S. Lam; Brian E. Moore; Craig B. Peterson; John Samuel Pieper; Linda J. Rankin; P. S. Tseng; Jim Sutton; John Urbanski; Jon A. Webb

A description is given of the iWarp architecture and how it supports various communication models and system configurations. The heart of an iWarp system is the iWarp component: a single-chip processor that requires only the addition of memory chips to form a complete system building block, called the iWarp cell. Each iWarp component contains both a powerful computation engine that runs at 20 MFLOPS (million floating-point operations per second) and a high-throughput (320 Mb/s), low-latency (100-150-ns) communication engine for interfacing with other iWarp cells. Because of their strong computation and communication capabilities, the iWarp components provide a versatile building block for high-performance parallel systems ranging from special-purpose systolic arrays to general-purpose distributed memory computers. They can support both fine-grain parallel and coarse-grain distributed computation models simultaneously in the same system. The initial iWarp demonstration system consists of an 8*8 torus of iWarp cells, delivering more than 1.2 GFLOP (billions of FLOPS). It can be expanded to include up to 1024 cells.<<ETX>>

Collaboration


Dive into the Shekhar Borkar's collaboration.

Researchain Logo
Decentralizing Knowledge