Vasantha Erraguntla
Intel
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Vasantha Erraguntla.
IEEE Journal of Solid-state Circuits | 2008
Sriram R. Vangal; Jason Howard; Gregory Ruhl; Saurabh Dighe; Howard Wilson; James W. Tschanz; David Finan; Arvind Singh; Tiju Jacob; Shailendra Jain; Vasantha Erraguntla; Clark Roberts; Yatin Hoskote; Nitin Borkar; Shekhar Borkar
This paper describes an integrated network-on-chip architecture containing 80 tiles arranged as an 8x10 2-D array of floating-point cores and packet-switched routers, both designed to operate at 4 GHz. Each tile has two pipelined single-precision floating-point multiply accumulators (FPMAC) which feature a single-cycle accumulation loop for high throughput. The on-chip 2-D mesh network provides a bisection bandwidth of 2 Terabits/s. The 15-FO4 design employs mesochronous clocking, fine-grained clock gating, dynamic sleep transistors, and body-bias techniques. In a 65-nm eight-metal CMOS process, the 275 mm2 custom design contains 100 M transistors. The fully functional first silicon achieves over 1.0 TFLOPS of performance on a range of benchmarks while dissipating 97 W at 4.27 GHz and 1.07 V supply.
international solid-state circuits conference | 2010
Jason Howard; Saurabh Dighe; Yatin Hoskote; Sriram R. Vangal; David Finan; Gregory Ruhl; David Jenkins; Howard Wilson; Nitin Borkar; Gerhard Schrom; Fabric Pailet; Shailendra Jain; Tiju Jacob; Satish Yada; Sraven Marella; Praveen Salihundam; Vasantha Erraguntla; Michael Konow; Michael Riepen; Guido Droege; Joerg Lindemann; Matthias Gries; Thomas Apel; Kersten Henriss; Tor Lund-Larsen; Sebastian Steibl; Shekhar Borkar; Vivek De; Rob F. Van der Wijngaart; Timothy G. Mattson
Current developments in microprocessor design favor increased core counts over frequency scaling to improve processor performance and energy efficiency. Coupling this architectural trend with a message-passing protocol helps realize a data-center-on-a-die. The prototype chip (Figs. 5.7.1 and 5.7.7) described in this paper integrates 48 Pentium™ class IA-32 cores [1] on a 6×4 2D-mesh network of tiled core clusters with high-speed I/Os on the periphery. The chip contains 1.3B transistors. Each core has a private 256KB L2 cache (12MB total on-die) and is optimized to support a message-passing-programming model whereby cores communicate through shared memory. A 16KB message-passing buffer (MPB) is present in every tile, giving a total of 384KB on-die shared memory, for increased performance. Power is kept at a minimum by transmitting dynamic, fine-grained voltage-change commands over the network to an on-die voltage-regulator controller (VRC). Further power savings are achieved through active frequency scaling at the tile granularity. Memory accesses are distributed over four on-die DDR3 controllers for an aggregate peak memory bandwidth of 21GB/s at 4× burst. Additionally, an 8-byte bidirectional system interface (SIF) provides 6.4GB/s of I/O bandwidth. The die area is 567mm2 and is implemented in 45nm high-к metal-gate CMOS [2].
IEEE Journal of Solid-state Circuits | 2011
Jason Howard; Saurabh Dighe; Sriram R. Vangal; Gregory Ruhl; Nitin Borkar; Shailendra Jain; Vasantha Erraguntla; Michael Konow; Michael Riepen; Matthias Gries; Guido Droege; Tor Lund-Larsen; Sebastian Steibl; S. Borkar; Vivek De; R Van Der Wijngaart
This paper describes a multi-core processor that integrates 48 cores, 4 DDR3 memory channels, and a voltage regulator controller in a 64 2D-mesh network-on-chip architecture. Located at each mesh node is a five-port virtual cut-through packet-switched router shared between two IA-32 cores. Core-to-core communication uses message passing while exploiting 384 KB of on-die shared memory. Fine grain power management takes advantage of 8 voltage and 28 frequency islands to allow independent DVFS of cores and mesh. At the nominal 1.1 V supply, the cores operate at 1 GHz while the 2D-mesh operates at 2 GHz. As performance and voltage scales, the processor dissipates between 25 W and 125 W. The processor is implemented in 45 nm Hi-K CMOS and has 1.3 billion transistors.
international solid-state circuits conference | 2012
Shailendra Jain; Surhud Khare; Satish Yada; V Ambili; Praveen Salihundam; Shiva Ramani; Sriram Muthukumar; Manali R Srinivasan; Arun Kumar; Shasi Kumar Gb; Rajaraman Ramanarayanan; Vasantha Erraguntla; Jason Howard; Sriram R. Vangal; Saurabh Dighe; Greg Ruhl; Paolo A. Aseron; Howard Wilson; Nitin Borkar; Vivek De; Shekhar Borkar
Near-threshold computing brings the promise of an order of magnitude improvement in energy efficiency over the current generation of microprocessors [1]. However, frequency degradation due to aggressive voltage scaling may not be acceptable across all single-threaded or performance-constrained applications. Enabling the processor to operate over a wide voltage range helps to achieve best possible energy efficiency while satisfying varying performance demands of the applications. This paper describes an IA-32 processor fabricated in 32nm CMOS technology [2], demonstrating a reliable ultra-low voltage operation and energy efficient performance across the wide voltage range from 280mV to 1.2V.
IEEE Journal of Solid-state Circuits | 2011
Saurabh Dighe; Sriram R. Vangal; Paolo A. Aseron; Shasi Kumar; Tiju Jacob; Keith A. Bowman; Jason Howard; James W. Tschanz; Vasantha Erraguntla; Nitin Borkar; Vivek De; Shekhar Borkar
In this paper, we present measured within-die core-to-core Fmax and leakage variation data for an 80-core processor in 65 nm CMOS and 1) populate a parameterized energy/performance model to determine the most energy-efficient operating point for a workload; 2) examine impacts of per-core clock and power gating on optimal dynamic voltage-frequency-core scaling (DVFCS) operating points; and 3) compare improvements in energy efficiency achievable by variation-aware DVFCS and core mapping on Single-Voltage/Multiple-Frequency (SVMF), Multiple-Voltage/Single-Frequency (MVSF) and Multiple-Voltage/Multiple-Frequency (MVMF) designs. Variation-aware DVFS with optimal core mapping is shown to improve energy efficiency 6%-35% across a range of compute/communication activity workloads. A new dynamic thread hopping scheme boosts performance by 5%-10% or energy efficiency by 20%-60%.
international solid-state circuits conference | 2004
Siva G. Narendra; James W. Tschanz; Joseph Hofsheier; Bradley Bloechel; Sriram R. Vangal; Yatin Hoskote; Stephen H. Tang; Dinesh Somasekhar; Ali Keshavarzi; Vasantha Erraguntla; Greg Dermer; Nitin Borkar; Shekhar Borkar; Vivek De
A low-voltage swapped-body biasing technique where PMOS bodies are connected to ground and NMOS bodies to Vcc is evaluated. Available measurements show more than 2.6x frequency improvement at 0.5V Vcc and the ability to reduce Vcc by 0.2V for the same frequency compared to no body bias in 180 to 90nm CMOS technologies.
international solid-state circuits conference | 2002
Siva G. Narendra; M. Haycock; V. Govindarajulu; Vasantha Erraguntla; Howard Wilson; Sriram R. Vangal; Amaresh Pangal; E. Seligman; Rajendran Nair; Ali Portland Keshavarzi; Bradley Bloechel; Gregory E. Dermer; R. Mooney; Nitin Borkar; S. Borkar; Vivek De
A router chip, that incorporates on-chip forward body biasing capability with 2% area overhead, achieves 1 GHz operation at 1.1 V supply in a 150 nm logic technology, compared to 1.25 V required for the original design having no body bias. Switching power is 23% less and chip leakage is reduced by 3.5/spl times/ in standby mode by withdrawing forward bias.
international solid-state circuits conference | 2010
Saurabh Dighe; Sriram R. Vangal; Paolo A. Aseron; Shasi Kumar; Tiju Jacob; Keith A. Bowman; Jason Howard; James W. Tschanz; Vasantha Erraguntla; Nitin Borkar; Vivek De; Shekhar Borkar
Many-core processors with on-die network-on-chip (NoC) interconnects have emerged as viable architectures for Single-Instruction/Multiple-Data (SIMD) vector applications and parallel workloads, and have been implemented in 65nm CMOS with Dynamic Voltage-Frequency Scaling (DVFS). Chips with Single-Voltage/Single-Frequency (SVSF) for all cores running homogeneous threads as well as Multiple-Voltage/Multiple-Frequency (MVMF), running heterogeneous applications and using independent V/F control for each core, have been reported. Combination of DVFS with dynamic core-count scaling (or DVFCS) has been proposed to further improve performance & energy efficiency across varying workloads. With technology scaling, both leakage power and core-to-core variations in frequency (Fmax) & leakage due to within-die device parameter variations have become significant, thus creating the need for per-core power gating and variation-aware DVFCS. Recently, variation-aware core mapping has been investigated using high level architectural simulations and statistical variation models.
symposium on vlsi circuits | 2002
Tanay Karnik; Sriram R. Vangal; V. Veeramachaneni; Peter Hazucha; Vasantha Erraguntla; Shekhar Borkar
This paper presents a technique to selectively engineer sequential or domino nodes in high performance circuits to improve soft error rate (SER) induced by cosmic rays or alpha particles. In 0.18 /spl mu/m process, the SER improvement is as much as 3/spl times/ at the cell-level, 1.8/spl times/ at the block-level and 1.3/spl times/ at the chip-level without any penalty in performance or area, and <3% power penalty. The node selection, hardening and SER quantification steps are fully automated.
IEEE Journal of Solid-state Circuits | 2003
Yatin Hoskote; Bradley Bloechel; Greg Dermer; Vasantha Erraguntla; David Finan; Jason Howard; D. Klowden; Siva G. Narendra; Gregory Ruhl; J. Tschanz; Sriram R. Vangal; V. Veeramachaneni; Howard Wilson; Jianping Xu; Nitin Borkar
This programmable engine is designed to offload TCP inbound processing at wire speed for 10-Gb/s Ethernet, supporting 64-byte minimum packet size. This prototype chip employs a high-speed core and a specialized instruction set. It includes hardware support for dynamically reordering out-of-order packets. In a 90-nm CMOS process, the 8-mm/sup 2/ experimental chip has 460 K transistors. First silicon has been validated to be fully functional and achieves 9.64-Gb/s packet processing performance at 1.72 V and consumes 6.39 W.